History log of /netbsd-current/sys/kern/uipc_usrreq.c
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1.203 28-May-2022 andvar

s/grabing/grabbing/ in comments.


# 1.202 09-Apr-2022 riastradh

unix(4): Convert membar_exit to membar_release.

Use atomic_load_consume or atomic_load_relaxed where necessary.

Comment on why unlocked nonatomic access is valid where it is done.


Revision tags: thorpej-i2c-spi-conf2-base
# 1.201 08-Aug-2021 nia

introduce a SOL_LOCAL for unix-domain socket level socket options
as an alias of the current 0 used for these options, as in FreeBSD.

reviewed by many.


Revision tags: thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.200 06-Nov-2020 christos

PR/55777: Ruslan Nikolaev: Move the unp_sysctl_create to uipc_usrreq.c to
facilitate splitting rump modules and does not require a dummy function.


# 1.199 26-Aug-2020 christos

branches: 1.199.2;
add socket info for user and group for unix sockets in fstat.


Revision tags: bouyer-xenpvh-base2
# 1.198 21-Apr-2020 ad

Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.


Revision tags: phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.197 23-Feb-2020 ad

branches: 1.197.4;
Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


# 1.196 01-Feb-2020 riastradh

Load struct fdfile::ff_file with atomic_load_consume.

Exceptions: when we're only testing whether it's there, not about to
dereference it.

Note: We do not use atomic_store_release to set it because the
preceding mutex_exit should be enough.

(That said, it's not clear the mutex_enter/exit is needed unless
refcnt > 0 already, in which case maybe it would be a win to switch
from the membar implied by mutex_enter to the membar implied by
atomic_store_release -- which I would generally expect to be much
cheaper. And a little clearer without a long comment.)


# 1.195 01-Feb-2020 riastradh

Load struct filedesc::fd_dt with atomic_load_consume.

Exceptions: when fd_refcnt <= 1, or when holding fd_lock.

While here:

- Restore KASSERT(mutex_owned(&fdp->fd_lock)) in fd_unused.
=> This is used only in fd_close and fd_abort, where it holds.
- Move bounds check assertion in fd_putfile to where it matters.
- Store fd_dt with atomic_store_release.
- Move load of fd_dt under lock in knote_fdclose.
- Omit membar_consumer in fdesc_readdir.
=> atomic_load_consume serves the same purpose now.
=> Was needed only on alpha anyway.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 ad-namecache-base2 ad-namecache-base1 ad-namecache-base netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base
# 1.194 29-Jul-2019 maxv

branches: 1.194.2; 1.194.4;
Fix info leak: the padding after the header causes uninitialized heap
memory to be copied to userland in sys_recvmsg().


Revision tags: phil-wifi-20190609
# 1.193 03-Jun-2019 msaitoh

Fix typo in comment (s/seperate/separate/).


Revision tags: isaki-audio2-base
# 1.192 01-Mar-2019 pgoyette

Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.

NFCI intended.

Ride the earlier kernel bump - it;s getting crowded.


# 1.191 20-Feb-2019 pgoyette

compat70_ocreds_valid is not a pointer to a boolean, it is the boolean
itself which controls whether or not we recognize the OCRED options.

Should fix the panic identified in PR kern/53991 (awaiting confirmation
from submitter).


# 1.190 04-Feb-2019 mrg

add or adjust fallthru comments.


# 1.189 29-Jan-2019 pgoyette

Normalize all the compat hooks' names to the form

<subsystem>_<function>_<version>_hook

NFCI

XXX Note that although this introduces a change in the kernel-to-
XXX module interface, we are NOT bumping the kernel version number.
XXX We will bump the version number once the interface stabilizes.


# 1.188 27-Jan-2019 pgoyette

Merge the [pgoyette-compat] branch


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.187 08-Nov-2018 roy

Don't call soroverflow when we return the error to the sender.

Thanks to thorpej@ for a sanity check.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.186 11-May-2018 roy

branches: 1.186.2;
Increase the default size of some receive buffers from 8k to 16k.
This mitigates recent reports of socket overflow errors
and fixes PR bin/53247.


# 1.185 05-May-2018 christos

bump PIPSIZ from 4 to 8K like FreeBSD and provide the same sysctls


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.184 19-Mar-2018 roy

socket: report receive buffer overflows

Add soroverflow() which increments the overflow counter, sets so_error
to ENOBUFS and wakes the receive socket up.
Replace all code that manually increments this counter with soroverflow().
Add soroverflow() to raw_input().

This allows userland to detect route(4) overflows so it can re-sync
with the current state.


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.183 17-Feb-2018 christos

branches: 1.183.2;
fix LOCAL_PEEREID to not return the same info for both sides...
XXX: pullup-{7,8}


Revision tags: tls-maxphys-base-20171202
# 1.182 02-Dec-2017 mrg

include opt_compat_netbsd.h, so that eg COMPAT_70 will be set.


Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.181 31-Oct-2016 maxv

branches: 1.181.8;
Memory leak, found by Mootja. It is easily triggerable from userland.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.180 06-Apr-2016 roy

branches: 1.180.2;
Add sc_pid to sockcred so that SOCK_DGRAM and LOCAL_CREDS socket option
can learn the process id of the AF_LOCAL sender.
Add compat glue for old structure.


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606
# 1.179 02-May-2015 rtr

make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}


# 1.178 26-Apr-2015 rtr

remove pr_generic from struct pr_usrreqs and all implementations of
pr_generic in protocols.

bump to 7.99.13

approved by rmind@


# 1.177 24-Apr-2015 rtr

make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19


Revision tags: nick-nhusb-base-20150406
# 1.176 03-Apr-2015 rtr

* change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@


# 1.175 01-Mar-2015 christos

PR/39918: ITOH Yasufumi: Replace KASSERT with continue, since the file
descriptor can be closed since closef() does not pay attention to FDEFER.
XXX: Pullup-7


# 1.174 28-Feb-2015 rtr

Place opening brace to function at column 0, like in the rest of the file.


# 1.173 02-Feb-2015 christos

Handle LOCAL_PEERID for socketpair() connected sockets which connect through
connect2().
1. move the code that sets the peerid structure into connect1(). This
handles so2. The datagram code calls connect2 twice with flipped
so arguments so both sockets get set.
2. in connect2 copy the peerid structure from so2 to so, so that that
both stream sockets get set.


Revision tags: nick-nhusb-base
# 1.172 08-Oct-2014 taca

branches: 1.172.2;
Make behavior of getsockname(2) (and maybe getpeername(2)) as the same as
NetBSD 6.1_STABLE and other operating system (OS X 10.9.5).

* sa_len of sockaddr_un strucrure is always set to sizeof(sun_path).
* pathname stored in sun_path is alwasys '\0' terminated (except length
of sun_path is sizeof(sun_path)?).

Should be fix PR kern/49247, runtime problem of lmtp service of dovecot2 on
NetBSD current and NetBSD 7.0_BETA.


# 1.171 05-Sep-2014 matt

Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.


# 1.170 05-Sep-2014 matt

Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 09-Aug-2014 rtr

branches: 1.169.2;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@


# 1.168 08-Aug-2014 rtr

split PRU_RCVD function out of pr_generic() usrreq switches and put into
separate functions

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_RCVD with calls to
pr_rcvd()


# 1.167 05-Aug-2014 rtr

actually use the passed in struct lwp *l instead of curlwp in unp_connect()


# 1.166 05-Aug-2014 rtr

split PRU_SEND function out of pr_generic() usrreq switches and put into
separate functions

xxx_send(struct socket *, struct mbuf *, struct mbuf *,
struct mbuf *, struct lwp *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_SEND with calls to
pr_send()

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_send() PRUs

- l2cap_send() -> l2cap_send_pcb()
- sco_send() -> sco_send_pcb()
- rfcomm_send() -> rfcomm_send_pcb()

patch reviewed by rmind


# 1.165 05-Aug-2014 rtr

revert the removal of struct lwp * parameter from bind, listen and connect
user requests.

this should resolve the issue relating to nfs client hangs presented
recently by wiz on current-users@


# 1.164 31-Jul-2014 rtr

* remove declarations of unp_bind, unp_discard, unp_disconnect1, unp_drop,
unp_shutdown1, unp_internalize and unp_output functions from sys/un.h
and instead declare them as static in uipc_usrreq.c with prototype
declarations as necessary.

* remove struct lwp * parameter from unp_output() while here and just
use curlwp instead.

as discussed with rmind


# 1.163 31-Jul-2014 rtr

split PRU_DISCONNECT, PRU_SHUTDOWN and PRU_ABORT function out of
pr_generic() usrreq switches and put into separate functions

xxx_disconnect(struct socket *)
xxx_shutdown(struct socket *)
xxx_abort(struct socket *)

- always KASSERT(solocked(so)) even if not implemented
- replace calls to pr_generic() with req =
PRU_{DISCONNECT,SHUTDOWN,ABORT}
with calls to pr_{disconnect,shutdown,abort}() respectively

rename existing internal functions used to implement above functionality
to permit use of the names for xxx_{disconnect,shutdown,abort}().

- {l2cap,sco,rfcomm}_disconnect() ->
{l2cap,sco,rfcomm}_disconnect_pcb()
- {unp,rip,tcp}_disconnect() -> {unp,rip,tcp}_disconnect1()
- unp_shutdown() -> unp_shutdown1()

patch reviewed by rmind


# 1.162 30-Jul-2014 rtr

split PRU_CONNECT function out of pr_generic() usrreq switches and put
into seaparate functions

xxx_listen(struct socket *, struct mbuf *)

- always KASSERT(solocked(so)) and KASSERT(nam != NULL)
- replace calls to pr_generic() with req = PRU_CONNECT with
pr_connect()
- rename existin {l2cap,sco,rfcomm}_connect() to
{l2cap,sco,rfcomm}_connect_pcb() respectively to permit
naming consistency with other protocols functions.
- drop struct lwp * parameter from unp_connect() and at_pcbconnect()
and use curlwp instead where appropriate.

patch reviewed by rmind


# 1.161 24-Jul-2014 rtr

split PRU_BIND and PRU_LISTEN function out of pr_generic() usrreq
switches and put into separate functions
xxx_bind(struct socket *, struct mbuf *)
xxx_listen(struct socket *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_BIND with call to
pr_bind()

- replace calls to pr_generic() with req = PRU_LISTEN with call to
pr_listen()

- drop struct lwp * parameter from at_pcbsetaddr(), in_pcbbind() and
unp_bind() and always use curlwp.

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_{bind,listen}() PRUs

- l2cap_{bind,listen}() -> l2cap_{bind,listen}_pcb()
- sco_{bind,listen}() -> sco_{bind,listen}_pcb()
- rfcomm_{bind,listen}() -> rfcomm_{bind,listen}_pcb()

patch reviewed by rmind

welcome to netbsd 6.99.48


# 1.160 23-Jul-2014 rtr

split PRU_SENDOOB and PRU_RCVOOB function out of pr_generic() usrreq
switches and put into separate functions
xxx_sendoob(struct socket *, struct mbuf *, struct mbuf *)
xxx_recvoob(struct socket *, struct mbuf *, int)

- always KASSERT(solocked(so)) even if request is not implemented

- replace calls to pr_generic() with req = PRU_{SEND,RCV}OOB with
calls to pr_{send,recv}oob() respectively.

there is still some tweaking of m_freem(m) and m_freem(control) to come
for consistency. not performed with this commit for clarity.

reviewed by rmind


# 1.159 09-Jul-2014 rtr

* split PRU_ACCEPT function out of pr_generic() usrreq switches and put
into a separate function xxx_accept(struct socket *, struct mbuf *)

note: future cleanup will take place to remove struct mbuf parameter
type and replace it with a more appropriate type.

patch reviewed by rmind


# 1.158 09-Jul-2014 rtr

* split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47


# 1.157 07-Jul-2014 rtr

* sprinkle KASSERT(solocked(so)); in all pr_stat() functions.
* fix remaining inconsistent struct socket parameter names.


# 1.156 06-Jul-2014 rtr

* split PRU_SENSE functionality out of xxx_usrreq() switches and place into
separate xxx_stat(struct socket *, struct stat *) functions.
* replace calls using pr_generic with req == PRU_SENSE with pr_stat().

further change will follow that cleans up the pattern used to extract the
pcb and test for its presence.

reviewed by rmind


# 1.155 01-Jul-2014 rtr

fix parameter types in pr_ioctl, called xx_control() functions and remove
abuse of pointer to struct mbuf type.

param2 changed to u_long type and uses parameter name 'cmd' (ioctl command)
param3 changed to void * type and uses parameter name 'data'
param4 changed to struct ifnet * and uses parameter name 'ifp'
param5 has been removed (formerly struct lwp *) and uses of 'l' have been
replaced with curlwp from curproc(9).

callers have had (now unnecessary) casts to struct mbuf * removed, called
code has had (now unnecessary) casts to u_long, void * and struct ifnet *
respectively removed.

reviewed by rmind@


# 1.154 22-Jun-2014 rtr

* split PRU_CONTROL functionality out of xxx_userreq() switches and place
into separate xxx_ioctl() functions.
* place KASSERT(req != PRU_CONTROL) inside xxx_userreq() as it is now
inappropriate for req = PRU_CONTROL in xxx_userreq().
* replace calls to pr_generic() with req = PRU_CONTROL with pr_ioctl().
* remove & fixup references to PRU_CONTROL xxx_userreq() function comments.
* fix various comments references for xxx_userreq() that mentioned
PRU_CONTROL as xxx_userreq() no longer handles the request.

a further change will follow to fix parameter and naming inconsistencies
retained from original code.

Reviewed by rmind@


# 1.153 08-Jun-2014 christos

Handle race where the server closed the socket between us 'connecting' and
sending data.


# 1.152 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.151 18-May-2014 rmind

Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.150 23-Jan-2014 hannken

branches: 1.150.2;
Change vnode operations create, mknod, mkdir and symlink to return
the resulting vnode *vpp unlocked.

Discussed on tech-kern@

Welcome to 6.99.30


# 1.149 17-Jan-2014 hannken

Change vnode operations create, mknod, mkdir and symlink to keep the
directory node dvp locked on return.

Discussed on tech-kern@

Welcome to 6.99.29


# 1.148 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.147 25-Oct-2013 martin

Eliminat an unused variable and simplify the KASSERT which used to use it


# 1.146 08-Oct-2013 christos

Centralize the sockaddr_un allocation code. Set sun_len appropriately so
that the address length returned is correct, not always 106. Note that
we do things slightly differently than linux and explain why. Unit-tests
to come.


# 1.145 08-Oct-2013 christos

- Instead of having accept(2) return a zero-filled sockaddr for the case
where accept(2) was called on a unix socket that called connect(2) and
then close(2), before the connection was accepted, return the empty
sockaddr_un.
- Fix the length of the empty sockaddr_un socket so that it reflects reality.


# 1.144 29-Aug-2013 rmind

Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.


# 1.143 01-Aug-2013 drochner

In unp_externalize, don't do anything if an SCM_RIGHTS control message
was sent with zero file descriptors in it. Otherwise, a zero-length
temporary storage would be allocated which triggers panic on DIAGNOSTIC
kernels (but is harmless for release kernels).
reviewed by Taylor R Campbell


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.142 27-Jun-2013 christos

branches: 1.142.2;
use sbcreatecontrol1() and m_add() instead of open-coding everything, and
getting it slightly wrong.


Revision tags: agc-symver-base
# 1.141 14-Feb-2013 riastradh

Fix some screw cases in cmsg file descriptor passing.

- Don't leave garbage in the control buffer if allocating file
descriptors fails in unp_externalize.

- Scrub the space between CMSG_LEN and CMSG_SPACE to avoid kernel
memory disclosure in unp_externalize.

- Don't read past cmsg_len when closing file descriptors that
couldn't get delivered, in free_rights.

ok christos


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.140 06-Oct-2012 christos

Avoid crash dereferencing a NULL fp in fd_affix() in unp_externalize
caused by the sequence of passing two fd's with two sendmsg()'s,
then doing a read() and a recvmsg(). The read() calls dom_dispose()
which discards both messages in the mbuf, and sets the fp's in the
array to NULL. Linux dequeues only one message per read() so the
second recvmsg() gets the fd from the second message. This fix
just avoids the NULL pointer de-reference, making the second
recvmsg() to fail. It is dubious to pass fd's with stream sockets
and expect mixing read() and recvmsg() to work. Plus processing
one control message per read() changes the current semantics and
should be examined before applied. In addition there is a race between
dom_externalize() and dom_dispose(): what happens in a multi-threaded
network stack when one thread disposes where the other externalizes
the same array?

NB: Pullup to 6.


# 1.139 30-Jul-2012 christos

branches: 1.139.2;
remove infinite loop on error, extra parens on return.


# 1.138 30-Jul-2012 christos

simplify unp_externalize(), some from gimpy, some from me.


# 1.137 02-Jun-2012 martin

Stopgap fix for PR kern/46463: disallow passing of kqueue descriptors
via SCM_RIGHT anxiliary socket messages.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.136 26-Jun-2011 christos

branches: 1.136.2; 1.136.8;
* Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).

- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.

* Fix the system calls that take socklen_t arguments to actually do so.

* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).

* Bump libc version for the new syscalls.


# 1.135 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.134 29-May-2011 manu

branches: 1.134.2;
Add SOCK_SEQPACKET to PL_LOCAL sockets. Based on patch from Jesse Off,
submitted 8 years ago:
http://mail-index.netbsd.org/tech-kern/2003/04/14/0006.html


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.133 19-Nov-2010 dholland

branches: 1.133.2;
Introduce struct pathbuf. This is an abstraction to hold a pathname
and the metadata required to interpret it. Callers of namei must now
create a pathbuf and pass it to NDINIT (instead of a string and a
uio_seg), then destroy the pathbuf after the namei session is
complete.

Update all namei call sites accordingly. Add a pathbuf(9) man page and
update namei(9).

The pathbuf interface also now appears in a couple of related
additional places that were passing string/uio_seg pairs that were
later fed into NDINIT. Update other call sites accordingly.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3
# 1.132 21-Oct-2010 yamt

unp_connect: fix an assertion


# 1.131 21-Oct-2010 yamt

unp_connect2: fix a comment.


Revision tags: yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.130 24-Jun-2010 hannken

Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.129 09-Feb-2010 wiz

branches: 1.129.2;
Fix typo in comment.


Revision tags: uebayasi-xip-base
# 1.128 08-Jan-2010 pooka

branches: 1.128.2;
The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change


Revision tags: matt-premerge-20091211 yamt-nfs-mp-base8 jym-xensuspend-nbase
# 1.127 26-Aug-2009 bouyer

In uipc_usrreq(PRU_ACCEPT), grab the unp_streamlock before unp_setpeerlocks().
This fixes a race where, for a short period of time, so->so_lock and
so2->so_lock are not sync. This makes solocked2() and solocked()
unreliable and cause DIAGNOSTIC kernel panics. This also fixes a possible
panic in unp_setaddr() which expects the socket locked.
Should fix kern/38968, fix proposed in
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005863.html


Revision tags: yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.126 24-May-2009 ad

More changes to improve kern_descrip.c.

- Avoid atomics in more places.
- Remove the per-descriptor mutex, and just use filedesc_t::fd_lock.
It was only being used to synchronize close, and in any case we needed
to take fd_lock to free the descriptor slot.
- Optimize certain paths for the <NDFDFILE case.
- Sprinkle more comments and assertions.
- Cache more stuff in filedesc_t.
- Fix numerous minor bugs spotted along the way.
- Restructure how the open files array is maintained, for clarity and so
that we can eliminate the membar_consumer() call in fd_getfile(). This is
mostly syntactic sugar; the main functional change is that fd_nfiles now
lives alongside the open file array.

Some measurements with libmicro:

- simple file syscalls are like close() are between 1 to 10% faster.
- some nice improvements, e.g. poll(1000) which is ~50% faster.


Revision tags: yamt-nfs-mp-base4 yamt-nfs-mp-base3 jym-xensuspend-base
# 1.125 04-May-2009 yamt

tweak some assertions on so_head to make them more meaningful.


Revision tags: nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.124 09-Apr-2009 yamt

0 -> NULL


# 1.123 09-Apr-2009 yamt

remove an unnecessary cast.


# 1.122 09-Apr-2009 yamt

0 -> NULL where appropriate


# 1.121 11-Mar-2009 mrg

completely rework the way that orphaned sockets that are being fdpassed
via SCM_RIGHTS messages are dealt with:

1. unp_gc: make this a kthread.

2. unp_detach: go not call unp_gc directly. instead, wake up unp_gc kthread.

3. unp_scan: do not close files here. instead, put them on a global list
for unp_gc to close, along with a per-file "deferred close count". if
file is already enqueued for close, just increment deferred close count.
this eliminates the recursive calls.

3. unp_gc: scan files on global deferred close list. close each file N
times, as specified by deferred close count in file. continue processing
list until it becomes empty (closing may cause additional files to be
queued for close).

4. unp_gc: add additional bit to mark files we are scanning. set during
initial scan of global file list that currently clears FMARK/FDEFER.
during later scans, never examine / garbage collect descriptors that
we have not marked during the earlier scan. do not proceed with this
initial scan until all deferred closes have been processed. be careful
with locking to ensure no races are introduced between deferred close
and file scan.

5. unp_gc: use dummy file_t to mark position in list when scanning. allow
us to drop filelist_lock. in turn allows us to eliminate kmem_alloc()
and safely close files, etc.

6. prohibit transfer of descriptors within SCM_RIGHTS messages if
(num_files_in_transit > maxfiles / unp_rights_ratio)

7. fd_allocfile: ensure recycled filse don't get scanned.


this is 97% work done by andrew doran, with a couple of minor bug fixes
and a lot of testing by yours truly.


Revision tags: nick-hppapmap-base2
# 1.120 08-Feb-2009 pooka

branches: 1.120.2;
Don't try to fd_putfile() descriptors we didn't manage to fd_getfile().

Fixes local DoS panic described in kern/40570.


Revision tags: netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.119 11-Oct-2008 pooka

branches: 1.119.2; 1.119.4;
Move uidinfo to its own module in kern_uidinfo.c and include in rump.
No functional change to uidinfo.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.118 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.117 20-Jun-2008 christos

branches: 1.117.2;
Also enforce that cm->cmsg_len >= CMSG_ALIGN(sizeof cmsghdr), from
Michael van Elst


# 1.116 20-Jun-2008 christos

Don't require cm->cmsg_len == control->m_len, just that the cm->cmsg_len
<= control->m_len, like FreeBSD does. Idea from Taylor R Campbell.


Revision tags: yamt-pf42-base4
# 1.115 10-Jun-2008 ad

There can be existing waiters on a socket's condition variables when we
change socket::so_lock, and they rely on the old lock to synchronize.
Wake them up whenever we change so_lock so they can restart their waits.


Revision tags: yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.114 28-Apr-2008 martin

branches: 1.114.2; 1.114.4;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.113 27-Apr-2008 ad

branches: 1.113.2;
Add a comment.


# 1.112 24-Apr-2008 ad

Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.111 20-Apr-2008 mlelstv

When unp_internalize fails (due to the sanity check or an out-of-memory
condition), it leaves the control message with file descriptors. Calling
unp_dispose() will interpret the message as containing file pointers
and crash the system.
This change removes unp_dispose() from this failure path and avoids
using goto to jump into switch statements...
The previous workaround to ignore such messages in unp_scan() is removed.


# 1.110 19-Apr-2008 mjf

If cm->cmsg_len is not valid for unp_internalize do not use it to work out
where the data is in unp_scan.

Fixes PR/38391


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.109 28-Mar-2008 ad

branches: 1.109.2;
Prevent overlapping calls to bind() and/or connect() on a Unix socket.


Revision tags: ad-socklock-base1
# 1.108 24-Mar-2008 yamt

merge yamt-lazymbuf branch.


Revision tags: yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.107 21-Mar-2008 rmind

unp_gc: unlock filelist_lock in a case of restart.


# 1.106 21-Mar-2008 ad

Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.105 25-Jan-2008 ad

branches: 1.105.6;
Remove VOP_LEASE. Discussed on tech-kern.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.104 05-Jan-2008 dsl

Use FILE_LOCK() and FILE_UNLOCK()


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base jmcneill-pm-base
# 1.103 08-Dec-2007 pooka

branches: 1.103.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.


Revision tags: vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase reinoud-bufcleanup-base
# 1.102 26-Nov-2007 pooka

branches: 1.102.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern


Revision tags: jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 vmlocking-base
# 1.101 08-Oct-2007 ad

branches: 1.101.4;
Merge file descriptor locking, cwdi locking and cross-call changes
from the vmlocking branch.


Revision tags: yamt-x86pmap-base2 yamt-x86pmap-base
# 1.100 19-Sep-2007 dyoung

branches: 1.100.2;
Make uipc_ctloutput() return ENOPROTOOPT instead of EINVAL when it
is passed a handle socket-option level that it does not care about.


Revision tags: nick-csl-alignment-base5
# 1.99 09-Aug-2007 he

branches: 1.99.2;
Add a new socket option for unix domain sockets: LOCAL_PEEREID, to make
it possible to get the pid, euid and egid of the process at the remote
end at the time it did bind() or connect().

Add a new libc function, getpeereid() to easily get at the euid and egid.
As a consequence, bump libc's minor number.

Document the LOCAL_PEEREID socket option in unix(4).

Based on contribution by Arne H. Juul, minor modifications by myself.


Revision tags: matt-mips64-base
# 1.98 03-Aug-2007 martin

branches: 1.98.2;
PR kern/32842:
do not leak file descriptors when sending a datagram with SCM_RIGHTS
fails. Patch from Gary Thorpe, based on changes in FreeBSD and work
from Christian Biere.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.97 22-Apr-2007 dsl

branches: 1.97.2; 1.97.6;
Change the way that emulations locate files within the emulation root to
avoid having to allocate space in the 'stackgap'
- which is very LWP unfriendly.
The additional code for non-emulation namei() is trivial, the reduction for
the emulations is massive.
The vnode for a processes emulation root is saved in the cwdi structure
during process exec.
If the emulation root the TRYEMULROOT flag are set, namei() will do an initial
search for absolute pathnames in the emulation root, if that fails it will
retry from the normal root.
".." at the emulation root will always go to the real root, even in the middle
of paths and when expanding symlinks.
Absolute symlinks found using absolute paths in the emulation root will be
relative to the emulation root (so /usr/lib/xxx.so -> /lib/xxx.so links
inside the emulation root don't need changing).
If the root of the emulation would be returned (for an emulation lookup), then
the real root is returned instead (matching the behaviour of emul_lookup,
but being a cheap comparison here) so that programs that scan "../.."
looking for the root dircetory don't loop forever.
The target for symbolic links is no longer mangled (it used to get the
CHECK_ALT_xxx() treatment, so could get /emul/xxx prepended).
CHECK_ALT_xxx() are no more. Most of the change is deleting them, and adding
TRYEMULROOT to the flags to NDINIT().
A lot of the emulation system call stubs could now be deleted.


Revision tags: thorpej-atomic-base
# 1.96 03-Apr-2007 hannken

Remove calls to now obsolete vn_start_write() and vn_finished_write().


# 1.95 04-Mar-2007 christos

branches: 1.95.2; 1.95.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.94 01-Nov-2006 cbiere

branches: 1.94.2; 1.94.4; 1.94.8;
Pointing one element past an array is fine, pointing before it isn't.


Revision tags: yamt-splraiseipl-base2 yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.93 03-Sep-2006 christos

branches: 1.93.2; 1.93.4;
use c99 initializers


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base8 yamt-pdpolicy-base7
# 1.92 23-Jul-2006 ad

Use the LWP cached credentials where sane.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.91 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.90 14-Apr-2006 christos

Coverity CID 1089: Add more KASSERTs to prevent NULL deref.


# 1.89 14-Apr-2006 christos

Coverity CID 1088: Add KASSERT to prevent NULL pointer deref.


# 1.88 13-Apr-2006 matt

Add a KASSERT to document a condition for the PRU_ABORT case.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.87 01-Mar-2006 christos

branches: 1.87.2; 1.87.4; 1.87.6;
PR/32856: Christian Biere: Don't panic if you send a control message with
SCM_RIGHTS on an unconnected stream socket.


# 1.86 11-Dec-2005 christos

branches: 1.86.2; 1.86.4; 1.86.6;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base ktrace-lwp-base
# 1.85 11-Nov-2005 simonb

Call nanotime() directly, instead of doing the
microtime()/TIMEVAL_TO_TIMESPEC() dance.


Revision tags: yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.84 30-Aug-2005 jmmv

Honor the user's umask while creating local sockets. Several other systems
do already this (such as FreeBSD, OpenBSD and Linux), so it will improve
portability of some third-party programs. No objections in tech-kern@.


# 1.83 16-Jun-2005 yamt

branches: 1.83.2;
uipc_usrreq: plug mbuf leak.


# 1.82 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


# 1.81 07-May-2005 christos

PR/30154: YAMAMOTO Takashi: tcp_close locking botch
chgsbsize() as mentioned in the PR can be called from an interrupt context
via tcp_close(). Avoid calling uid_find() in chgsbsize().
- Instead of storing so_uid in struct socketvar, store *so_uidinfo
- Add a simple lock to struct uidinfo.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.80 26-Feb-2005 perry

branches: 1.80.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.79 03-Sep-2004 darrenr

branches: 1.79.4; 1.79.6;
add a per-socket counter for dropped UDP packets when the internal buffers
are full.


# 1.78 22-May-2004 jonathan

Eliminate several uses of `curproc' from the socket-layer code and from NFS.

Add a new explicit `struct proc *p' argument to socreate(), sosend().
Use that argument instead of curproc. Follow-on changes to pass that
argument to socreate(), sosend(), and (*so->so_send)() calls.
These changes reviewed and independently recoded by Matt Thomas.

Changes to soreceive() and (*dom->dom_exernalize() from Matt Thomas:
pass soreceive()'s struct uio* uio->uio_procp to unp_externalize().
Eliminate curproc from unp_externalize. Also, now soreceive() uses
its uio->uio_procp value, pass that same value downward to
((pr->pru_usrreq)() calls for consistency, instead of (struct proc * )0.

Similar changes in sys/nfs to eliminate (most) uses of curproc,
either via the req-> r_procp field of a struct nfsreq *req argument,
or by passing down new explicit struct proc * arguments.

Reviewed by: Matt Thomas, posted to tech-kern.
NB: The (*pr->pru_usrreq)() change should be tested on more (all!) protocols.


# 1.77 18-Apr-2004 matt

Constify sun_noname.


# 1.76 18-Apr-2004 matt

ANSI'fy.


# 1.75 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.74 23-Mar-2004 junyoung

branches: 1.74.2; 1.74.4;
Nuke __P().


# 1.73 29-Dec-2003 martin

Avoid using m_clget() on a mbuf already in use, especially when we
need the data in the mbuf later and m_clget() changes some fields
overlaid to regular mbuf data. Instead, rearange code a bit, create
data into a new allocated buffer and and use MEXTADD to attach it to
the mbuf, if the mbuf internal space is not sufficient.

This fixes a crash on sparc64 (and probably all other archs where
sizeof(int) != sizeof(struct file *)) when running
regress/sys/kern/unfdpass.

Idea for solution from Matt Thomas, with additional input from YAMAMOTO
Takashi.


# 1.72 29-Nov-2003 matt

Restore a change that made AF_LOCAL sockets block on connect(2) until
accepted. However, this time this behavor is not the default. Instead
it must enabled by using the LOCAL_CONNWAIT socket option on either the
connecting or accepting socket.


# 1.71 29-Nov-2003 perry

Revert a change that altered the semantics of AF_LOCAL sockets. Sadly
this made us API incompatible with other Unixes.


# 1.70 15-Oct-2003 hannken

Add the gating of system calls that cause modifications to the underlying
file system.
The function vfs_write_suspend stops all new write operations to a file
system, allows any file system modifying system calls already in progress
to complete, then sync's the file system to disk and returns. The
function vfs_write_resume allows the suspended write operations to
complete.

From FreeBSD with slight modifications.

Approved by: Frank van der Linden <fvdl@netbsd.org>


# 1.69 03-Sep-2003 matt

Fix typo.


# 1.68 03-Sep-2003 matt

Change the behavor of AF_LOCAL connect() to sleep until the server has
accepted the connection. This can prevent a client from overwhelming a
server.


# 1.67 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.66 24-Jul-2003 jdolecek

back rev 1.63 (the linux hack) off - no compat specific code
in generic code, please

we need to massage the passed linux cmsg anyway, linux uses different
alignment for CMSG_DATA on at least some architectures


# 1.65 23-Jul-2003 itojun

backout previous, there was a comment on LINUX_SOL_SOCKET=1


# 1.64 23-Jul-2003 itojun

#define LINUX_SOL_SOCKET 1, so that we can answer "what the hell is this 1?"
at ease.


# 1.63 23-Jul-2003 christos

From Todd Vierling: Accept level == 1 for linux compat.


# 1.62 29-Jun-2003 fvdl

branches: 1.62.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.61 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.60 10-Apr-2003 christos

RP/21088: Jesse Off: Return ENOBUFS instead of EINVAL when sbappend fails.


# 1.59 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


# 1.58 25-Feb-2003 pk

Fix a simple_lock() mismatch in unp_internalize().
We may need to merge the passes over the files contained in the message
as noted by enami tsugutomo on tech-smp.


# 1.57 23-Feb-2003 pk

Make updating a file's reference and use count MP-safe.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.56 25-Nov-2002 itojun

branches: 1.56.2;
no need for error check after MEXTMALLOC - jdolecek


# 1.55 25-Nov-2002 itojun

MEXTMALLOC() can fail even if M_WAITOK, if arg is too big for malloc().


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.54 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base gehenna-devsw-base eeh-devprop-base newlock-base ifpoll-base thorpej-mips-cache-base
# 1.53 12-Nov-2001 lukem

add RCSIDs


# 1.52 18-Oct-2001 thorpej

branches: 1.52.2;
Deprecate the "m_act" alias of "m_nextpkt" (m_act is a historical
name), and just use m_nextpkt everywhere.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.51 14-Jun-2001 thorpej

branches: 1.51.2;
Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.


# 1.50 07-Jun-2001 thorpej

Rework fdalloc() even further: split fdalloc() into fdalloc() and
fdexpand(). The former will return ENOSPC if there is not space
in the current filedesc table. The latter performs the expansion
of the filedesc table. This means that fdalloc() won't ever block,
and it gives callers an opportunity to clean up before the
potentially-blocking fdexpand() call.

Update all fdalloc() callers to deal with the need-to-fdexpand() case.

Rewrite unp_externalize() to use fdalloc() and fdexpand() in a
safe way, using an algorithm suggested by Bill Sommerfeld:
- Use a temporary array of integers to hold the new filedesc table
indexes. This allows us to repeat the loop if necessary.
- Loop through the array of file *'s, assigning them to filedesc table
slots. If fdalloc() indicates expansion is necessary, undo the
assignments we've done so far, expand, and retry the whole process.
- Once all file *'s have been assigned to slots, update the f_msgcount
and unp_rights counters.
- Right before we return, copy the temporary integer array to the message
buffer, and trim the length as before.
Note that once locking is added to the filedesc array, this entire
operation will be `atomic', in that the lock will be held while
file *'s are assigned to embryonic table slots, thus preventing anything
else from using them.


# 1.49 06-Jun-2001 thorpej

Change fdalloc() to return ERESTART if we had to reallocate the
descriptor array, which may have blocked. Change callers of
fdalloc() to restart whatever they\'re doing if this condition
happens. (XXX unp_externalize() needs some work, but that will
be tackled later.)

Change finishdup() to close the descriptor in the `new\' slot if
one exists, and change sys_dup2() accordingly.

Closes a race condition when using kernel-assisted user threads.

While here, garbage-collect UF_MAPPED -- it is not used anywhere.


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base thorpej_scsipi_base
# 1.48 05-Jun-2000 thorpej

branches: 1.48.2; 1.48.4;
Oops, missed a couple of places where CMSG_*() should be used. No
functional change in this case, but the code is now correct.


# 1.47 05-Jun-2000 thorpej

- Fix file descriptor passing AGAIN. This has apparently been broken
on LP64 systems (and probably the SPARC) since the __cmsg_alignbytes()
changes went in.
- Change file descriptor passing to use CMSG_DATA(), not (cm + 1). This
pretty much has to be done in order to make it work properly on LP64,
and considering that it's been broken this long...
- Use CMSG_SPACE() to determine the mbuf length needed for a given
control message, and CMSG_LEN() to stash in the cmsg_len member.


Revision tags: minoura-xpg4dl-base
# 1.46 30-Mar-2000 augustss

branches: 1.46.2;
Get rid of register declarations.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base chs-ubc2-base
# 1.45 17-Jun-1999 thorpej

branches: 1.45.2;
Um, hi, let's initialize pointers before we use them.


# 1.44 05-May-1999 thorpej

Add "use counting" to file entries. When closing a file, and it's reference
count is 0, wait for use count to drain before finishing the close.

This is necessary in order for multiple processes to safely share file
descriptor tables.


# 1.43 05-May-1999 thorpej

Fix alignment problem in the garbage-collection code path.


# 1.42 30-Apr-1999 thorpej

Break cdir/rdir/cmask info out of struct filedesc, and put it in a new
substructure, `cwdinfo'. Implement optional sharing of this substructure.

This is required for clone(2).


# 1.41 21-Apr-1999 mrg

revert previous. oops.


# 1.40 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: netbsd-1-4-base
# 1.39 22-Mar-1999 sommerfe

branches: 1.39.2;
Disallow descriptor-passing of descriptors which are open on
directories which aren't under the recipient's root.

Clean up of many error conditions involving descriptor passing, to
eliminate infinite loops, panics, premature garbage collection of
sockets, and descriptor leaks:
- Avoid letting unp_gc() see descriptors with a refcount of zero by
removing them from the socket's queue before releasing them.
- Avoid socket leak in PRU_ABORT (this will also gc descriptors queued
on a not-yet accepted socket when the accepting socket goes away).
- Put in block comment explaining how unp_gc() should work.
- Correctly manage unp_defer count so we don't get stuck in an infinite
loop with nothing to do.
- Don't tie MARK and DEFER bits so closely together.
- Mark descriptors queued on not-yet-accepted sockets as well.
- Don't call sorflush on non-socket, it doesn't work very well.
- Deal with discard of NULL file pointer.
- Hopefully cause GC to converge faster by only deferring sockets in
unp_mark().


# 1.38 21-Dec-1998 thorpej

In unp_internalize(), add a comment explaining why we must ALIGN() the
data after the cmsghdr when accessing internalized SCM_RIGHTS messages
(i.e. array of struct file *s). The historic interface does not align
the externalized SCM_RIGHTS messages (i.e. array of ints).


# 1.37 21-Dec-1998 thorpej

Fix a fencepost error in unp_scan() which caused a bad pointer deref on
the SPARC platform only (ILP32 but ALIGNBYTES of 7), due to a missing
ALIGN().


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.36 04-Aug-1998 perry

Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)


# 1.35 31-Jul-1998 perry

fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.


Revision tags: eeh-paddr_t-base
# 1.34 18-Jul-1998 lukem

branches: 1.34.2;
use AF_LOCAL instead of AF_UNIX


# 1.33 16-Jul-1998 thorpej

Back out previous, I botched something.


# 1.32 10-Jul-1998 thorpej

For SOCK_STREAM, provide the socket credentials to the accepter as soon as
the client connects.


# 1.31 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.30 07-Jan-1998 thorpej

Implement passing credentials as ancillary data on Unix domain sockets,
enabled with the LOCAL_CREDS socket option on the listener. Semantics are
similar to BSD/OS's:
- Creds are available with first data on SOCK_STREAM, and with every datagram
on SOCK_DGRAM.
- It is not possible to forge credentials.

Different in that:
- Different credential data structure (ours does not rely on the format
of internal kernel data structures, and does not pass the login name).
- We can pass creds and file descriptors at the same time (this does not
work in BSD/OS).

Luke Mewburn <lukem@netbsd.org> gets credit for inspiring me to implement
this. :-)


# 1.29 07-Jan-1998 thorpej

Fix passing of multiple file descriptors (was broken when code was made
64-bit safe).


Revision tags: netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.28 17-Oct-1997 christos

branches: 1.28.2;
PR/4280: Chris Jones: Sending more than one fd over AF_UNIX sockets causes
panic. Bug in the fd -> struct file * conversion...


Revision tags: thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.27 26-Jun-1997 thorpej

Several small changes to eliminate kludginess in dealing with unix domain
socket names:
- In unp_setsockaddr() and unp_setpeeraddr(), if the socket name can't
fit into a single mbuf, allocate enough external storage space to
hold it.
- In unp_bind() and unp_connect(), perform a similar operation, but allocate
one extra byte, and ensure that the pathname is nul-terminated.
Many thanks to enami tsugutomo <enami@cv.sony.co.jp> for the sanity
checking.


# 1.26 24-Jun-1997 thorpej

Eliminate use of dtom() in the handing of UNIX domain sockets. Add an
"unp_addrlen" member to the unpcb, and use it when copying the socket
name. This eliminates that last uses of dtom() in the system.


# 1.25 15-May-1997 kleink

When fstat(2)ing a file descriptor of a local communications domain socket,
fill the socket's creation time into the stat structure's st_[acm]time fields:
POSIX requires this behavior for pipe(2). N.B.: updating the st_[am]time fields
when reading/writing the pipe is neither required nor implemented, though.


# 1.24 10-Apr-1997 cgd

Internalize and externalize file descriptors being passed via local domain
socket control messages correctly, without assuming that sizeof(int) ==
sizeof(pointer). Fixes PR#3183.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.23 23-May-1996 mycroft

Oops. Add missing label.


# 1.22 23-May-1996 mycroft

We can only get a control mbuf for PRU_SEND or PRU_SENDOOB. Add diagnostic
code to panic in this case.


# 1.21 23-May-1996 mycroft

Make sure the control and data mbufs are freed in all cases.


# 1.20 23-May-1996 mycroft

Separate some code into separate functions.
Make unp_addr be a pointer to the sockaddr, not to the mbuf, as with raw
sockets.
Other minor cleanup.


# 1.19 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.18 09-Feb-1996 christos

branches: 1.18.4;
More proto fixes


# 1.17 04-Feb-1996 pk

unp_detach() return type botch.


# 1.16 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.15 17-Aug-1995 mycroft

so_pcb should be a void *.


# 1.14 16-Aug-1995 mycroft

Allocate PCBs with malloc(), more mgetclr(). Be more careful to free the
PCB after it's done with.


# 1.13 05-Apr-1995 mycroft

Add missing argument to closef().


# 1.12 13-Dec-1994 mycroft

LEASE_CHECK -> VOP_LEASE


# 1.11 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.10 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.9 08-Jun-1994 mycroft

Update to 4.4-Lite fs code.


# 1.8 04-May-1994 mycroft

Fix panic when closing a file descriptor on which access rights have been sent
but not received.


# 1.7 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.6 14-Sep-1993 mycroft

Fix from Noriyuki Soda <soda@sra.co.jp>:
recvmsg(2) always returns -1 with errno==EMSGSIZE, when trying
to pass file descriptors through UNIX domain socket.


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.5 27-Jun-1993 andrew

branches: 1.5.4;
ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.4 12-Jun-1993 andrew

Yuval Yarom's 'panic:closef: count < 0' fix to unp_discard().


# 1.3 22-May-1993 cgd

add include of select.h if necessary for protos, or delete if extraneous


# 1.2 18-May-1993 cgd

make kernel select interface be one-stop shopping & clean it all up.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.202 09-Apr-2022 riastradh

unix(4): Convert membar_exit to membar_release.

Use atomic_load_consume or atomic_load_relaxed where necessary.

Comment on why unlocked nonatomic access is valid where it is done.


Revision tags: thorpej-i2c-spi-conf2-base
# 1.201 08-Aug-2021 nia

introduce a SOL_LOCAL for unix-domain socket level socket options
as an alias of the current 0 used for these options, as in FreeBSD.

reviewed by many.


Revision tags: thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.200 06-Nov-2020 christos

PR/55777: Ruslan Nikolaev: Move the unp_sysctl_create to uipc_usrreq.c to
facilitate splitting rump modules and does not require a dummy function.


# 1.199 26-Aug-2020 christos

branches: 1.199.2;
add socket info for user and group for unix sockets in fstat.


Revision tags: bouyer-xenpvh-base2
# 1.198 21-Apr-2020 ad

Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.


Revision tags: phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.197 23-Feb-2020 ad

branches: 1.197.4;
Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


# 1.196 01-Feb-2020 riastradh

Load struct fdfile::ff_file with atomic_load_consume.

Exceptions: when we're only testing whether it's there, not about to
dereference it.

Note: We do not use atomic_store_release to set it because the
preceding mutex_exit should be enough.

(That said, it's not clear the mutex_enter/exit is needed unless
refcnt > 0 already, in which case maybe it would be a win to switch
from the membar implied by mutex_enter to the membar implied by
atomic_store_release -- which I would generally expect to be much
cheaper. And a little clearer without a long comment.)


# 1.195 01-Feb-2020 riastradh

Load struct filedesc::fd_dt with atomic_load_consume.

Exceptions: when fd_refcnt <= 1, or when holding fd_lock.

While here:

- Restore KASSERT(mutex_owned(&fdp->fd_lock)) in fd_unused.
=> This is used only in fd_close and fd_abort, where it holds.
- Move bounds check assertion in fd_putfile to where it matters.
- Store fd_dt with atomic_store_release.
- Move load of fd_dt under lock in knote_fdclose.
- Omit membar_consumer in fdesc_readdir.
=> atomic_load_consume serves the same purpose now.
=> Was needed only on alpha anyway.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 ad-namecache-base2 ad-namecache-base1 ad-namecache-base netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base
# 1.194 29-Jul-2019 maxv

branches: 1.194.2; 1.194.4;
Fix info leak: the padding after the header causes uninitialized heap
memory to be copied to userland in sys_recvmsg().


Revision tags: phil-wifi-20190609
# 1.193 03-Jun-2019 msaitoh

Fix typo in comment (s/seperate/separate/).


Revision tags: isaki-audio2-base
# 1.192 01-Mar-2019 pgoyette

Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.

NFCI intended.

Ride the earlier kernel bump - it;s getting crowded.


# 1.191 20-Feb-2019 pgoyette

compat70_ocreds_valid is not a pointer to a boolean, it is the boolean
itself which controls whether or not we recognize the OCRED options.

Should fix the panic identified in PR kern/53991 (awaiting confirmation
from submitter).


# 1.190 04-Feb-2019 mrg

add or adjust fallthru comments.


# 1.189 29-Jan-2019 pgoyette

Normalize all the compat hooks' names to the form

<subsystem>_<function>_<version>_hook

NFCI

XXX Note that although this introduces a change in the kernel-to-
XXX module interface, we are NOT bumping the kernel version number.
XXX We will bump the version number once the interface stabilizes.


# 1.188 27-Jan-2019 pgoyette

Merge the [pgoyette-compat] branch


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.187 08-Nov-2018 roy

Don't call soroverflow when we return the error to the sender.

Thanks to thorpej@ for a sanity check.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.186 11-May-2018 roy

branches: 1.186.2;
Increase the default size of some receive buffers from 8k to 16k.
This mitigates recent reports of socket overflow errors
and fixes PR bin/53247.


# 1.185 05-May-2018 christos

bump PIPSIZ from 4 to 8K like FreeBSD and provide the same sysctls


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.184 19-Mar-2018 roy

socket: report receive buffer overflows

Add soroverflow() which increments the overflow counter, sets so_error
to ENOBUFS and wakes the receive socket up.
Replace all code that manually increments this counter with soroverflow().
Add soroverflow() to raw_input().

This allows userland to detect route(4) overflows so it can re-sync
with the current state.


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.183 17-Feb-2018 christos

branches: 1.183.2;
fix LOCAL_PEEREID to not return the same info for both sides...
XXX: pullup-{7,8}


Revision tags: tls-maxphys-base-20171202
# 1.182 02-Dec-2017 mrg

include opt_compat_netbsd.h, so that eg COMPAT_70 will be set.


Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.181 31-Oct-2016 maxv

branches: 1.181.8;
Memory leak, found by Mootja. It is easily triggerable from userland.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.180 06-Apr-2016 roy

branches: 1.180.2;
Add sc_pid to sockcred so that SOCK_DGRAM and LOCAL_CREDS socket option
can learn the process id of the AF_LOCAL sender.
Add compat glue for old structure.


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606
# 1.179 02-May-2015 rtr

make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}


# 1.178 26-Apr-2015 rtr

remove pr_generic from struct pr_usrreqs and all implementations of
pr_generic in protocols.

bump to 7.99.13

approved by rmind@


# 1.177 24-Apr-2015 rtr

make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19


Revision tags: nick-nhusb-base-20150406
# 1.176 03-Apr-2015 rtr

* change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@


# 1.175 01-Mar-2015 christos

PR/39918: ITOH Yasufumi: Replace KASSERT with continue, since the file
descriptor can be closed since closef() does not pay attention to FDEFER.
XXX: Pullup-7


# 1.174 28-Feb-2015 rtr

Place opening brace to function at column 0, like in the rest of the file.


# 1.173 02-Feb-2015 christos

Handle LOCAL_PEERID for socketpair() connected sockets which connect through
connect2().
1. move the code that sets the peerid structure into connect1(). This
handles so2. The datagram code calls connect2 twice with flipped
so arguments so both sockets get set.
2. in connect2 copy the peerid structure from so2 to so, so that that
both stream sockets get set.


Revision tags: nick-nhusb-base
# 1.172 08-Oct-2014 taca

branches: 1.172.2;
Make behavior of getsockname(2) (and maybe getpeername(2)) as the same as
NetBSD 6.1_STABLE and other operating system (OS X 10.9.5).

* sa_len of sockaddr_un strucrure is always set to sizeof(sun_path).
* pathname stored in sun_path is alwasys '\0' terminated (except length
of sun_path is sizeof(sun_path)?).

Should be fix PR kern/49247, runtime problem of lmtp service of dovecot2 on
NetBSD current and NetBSD 7.0_BETA.


# 1.171 05-Sep-2014 matt

Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.


# 1.170 05-Sep-2014 matt

Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 09-Aug-2014 rtr

branches: 1.169.2;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@


# 1.168 08-Aug-2014 rtr

split PRU_RCVD function out of pr_generic() usrreq switches and put into
separate functions

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_RCVD with calls to
pr_rcvd()


# 1.167 05-Aug-2014 rtr

actually use the passed in struct lwp *l instead of curlwp in unp_connect()


# 1.166 05-Aug-2014 rtr

split PRU_SEND function out of pr_generic() usrreq switches and put into
separate functions

xxx_send(struct socket *, struct mbuf *, struct mbuf *,
struct mbuf *, struct lwp *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_SEND with calls to
pr_send()

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_send() PRUs

- l2cap_send() -> l2cap_send_pcb()
- sco_send() -> sco_send_pcb()
- rfcomm_send() -> rfcomm_send_pcb()

patch reviewed by rmind


# 1.165 05-Aug-2014 rtr

revert the removal of struct lwp * parameter from bind, listen and connect
user requests.

this should resolve the issue relating to nfs client hangs presented
recently by wiz on current-users@


# 1.164 31-Jul-2014 rtr

* remove declarations of unp_bind, unp_discard, unp_disconnect1, unp_drop,
unp_shutdown1, unp_internalize and unp_output functions from sys/un.h
and instead declare them as static in uipc_usrreq.c with prototype
declarations as necessary.

* remove struct lwp * parameter from unp_output() while here and just
use curlwp instead.

as discussed with rmind


# 1.163 31-Jul-2014 rtr

split PRU_DISCONNECT, PRU_SHUTDOWN and PRU_ABORT function out of
pr_generic() usrreq switches and put into separate functions

xxx_disconnect(struct socket *)
xxx_shutdown(struct socket *)
xxx_abort(struct socket *)

- always KASSERT(solocked(so)) even if not implemented
- replace calls to pr_generic() with req =
PRU_{DISCONNECT,SHUTDOWN,ABORT}
with calls to pr_{disconnect,shutdown,abort}() respectively

rename existing internal functions used to implement above functionality
to permit use of the names for xxx_{disconnect,shutdown,abort}().

- {l2cap,sco,rfcomm}_disconnect() ->
{l2cap,sco,rfcomm}_disconnect_pcb()
- {unp,rip,tcp}_disconnect() -> {unp,rip,tcp}_disconnect1()
- unp_shutdown() -> unp_shutdown1()

patch reviewed by rmind


# 1.162 30-Jul-2014 rtr

split PRU_CONNECT function out of pr_generic() usrreq switches and put
into seaparate functions

xxx_listen(struct socket *, struct mbuf *)

- always KASSERT(solocked(so)) and KASSERT(nam != NULL)
- replace calls to pr_generic() with req = PRU_CONNECT with
pr_connect()
- rename existin {l2cap,sco,rfcomm}_connect() to
{l2cap,sco,rfcomm}_connect_pcb() respectively to permit
naming consistency with other protocols functions.
- drop struct lwp * parameter from unp_connect() and at_pcbconnect()
and use curlwp instead where appropriate.

patch reviewed by rmind


# 1.161 24-Jul-2014 rtr

split PRU_BIND and PRU_LISTEN function out of pr_generic() usrreq
switches and put into separate functions
xxx_bind(struct socket *, struct mbuf *)
xxx_listen(struct socket *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_BIND with call to
pr_bind()

- replace calls to pr_generic() with req = PRU_LISTEN with call to
pr_listen()

- drop struct lwp * parameter from at_pcbsetaddr(), in_pcbbind() and
unp_bind() and always use curlwp.

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_{bind,listen}() PRUs

- l2cap_{bind,listen}() -> l2cap_{bind,listen}_pcb()
- sco_{bind,listen}() -> sco_{bind,listen}_pcb()
- rfcomm_{bind,listen}() -> rfcomm_{bind,listen}_pcb()

patch reviewed by rmind

welcome to netbsd 6.99.48


# 1.160 23-Jul-2014 rtr

split PRU_SENDOOB and PRU_RCVOOB function out of pr_generic() usrreq
switches and put into separate functions
xxx_sendoob(struct socket *, struct mbuf *, struct mbuf *)
xxx_recvoob(struct socket *, struct mbuf *, int)

- always KASSERT(solocked(so)) even if request is not implemented

- replace calls to pr_generic() with req = PRU_{SEND,RCV}OOB with
calls to pr_{send,recv}oob() respectively.

there is still some tweaking of m_freem(m) and m_freem(control) to come
for consistency. not performed with this commit for clarity.

reviewed by rmind


# 1.159 09-Jul-2014 rtr

* split PRU_ACCEPT function out of pr_generic() usrreq switches and put
into a separate function xxx_accept(struct socket *, struct mbuf *)

note: future cleanup will take place to remove struct mbuf parameter
type and replace it with a more appropriate type.

patch reviewed by rmind


# 1.158 09-Jul-2014 rtr

* split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47


# 1.157 07-Jul-2014 rtr

* sprinkle KASSERT(solocked(so)); in all pr_stat() functions.
* fix remaining inconsistent struct socket parameter names.


# 1.156 06-Jul-2014 rtr

* split PRU_SENSE functionality out of xxx_usrreq() switches and place into
separate xxx_stat(struct socket *, struct stat *) functions.
* replace calls using pr_generic with req == PRU_SENSE with pr_stat().

further change will follow that cleans up the pattern used to extract the
pcb and test for its presence.

reviewed by rmind


# 1.155 01-Jul-2014 rtr

fix parameter types in pr_ioctl, called xx_control() functions and remove
abuse of pointer to struct mbuf type.

param2 changed to u_long type and uses parameter name 'cmd' (ioctl command)
param3 changed to void * type and uses parameter name 'data'
param4 changed to struct ifnet * and uses parameter name 'ifp'
param5 has been removed (formerly struct lwp *) and uses of 'l' have been
replaced with curlwp from curproc(9).

callers have had (now unnecessary) casts to struct mbuf * removed, called
code has had (now unnecessary) casts to u_long, void * and struct ifnet *
respectively removed.

reviewed by rmind@


# 1.154 22-Jun-2014 rtr

* split PRU_CONTROL functionality out of xxx_userreq() switches and place
into separate xxx_ioctl() functions.
* place KASSERT(req != PRU_CONTROL) inside xxx_userreq() as it is now
inappropriate for req = PRU_CONTROL in xxx_userreq().
* replace calls to pr_generic() with req = PRU_CONTROL with pr_ioctl().
* remove & fixup references to PRU_CONTROL xxx_userreq() function comments.
* fix various comments references for xxx_userreq() that mentioned
PRU_CONTROL as xxx_userreq() no longer handles the request.

a further change will follow to fix parameter and naming inconsistencies
retained from original code.

Reviewed by rmind@


# 1.153 08-Jun-2014 christos

Handle race where the server closed the socket between us 'connecting' and
sending data.


# 1.152 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.151 18-May-2014 rmind

Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.150 23-Jan-2014 hannken

branches: 1.150.2;
Change vnode operations create, mknod, mkdir and symlink to return
the resulting vnode *vpp unlocked.

Discussed on tech-kern@

Welcome to 6.99.30


# 1.149 17-Jan-2014 hannken

Change vnode operations create, mknod, mkdir and symlink to keep the
directory node dvp locked on return.

Discussed on tech-kern@

Welcome to 6.99.29


# 1.148 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.147 25-Oct-2013 martin

Eliminat an unused variable and simplify the KASSERT which used to use it


# 1.146 08-Oct-2013 christos

Centralize the sockaddr_un allocation code. Set sun_len appropriately so
that the address length returned is correct, not always 106. Note that
we do things slightly differently than linux and explain why. Unit-tests
to come.


# 1.145 08-Oct-2013 christos

- Instead of having accept(2) return a zero-filled sockaddr for the case
where accept(2) was called on a unix socket that called connect(2) and
then close(2), before the connection was accepted, return the empty
sockaddr_un.
- Fix the length of the empty sockaddr_un socket so that it reflects reality.


# 1.144 29-Aug-2013 rmind

Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.


# 1.143 01-Aug-2013 drochner

In unp_externalize, don't do anything if an SCM_RIGHTS control message
was sent with zero file descriptors in it. Otherwise, a zero-length
temporary storage would be allocated which triggers panic on DIAGNOSTIC
kernels (but is harmless for release kernels).
reviewed by Taylor R Campbell


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.142 27-Jun-2013 christos

branches: 1.142.2;
use sbcreatecontrol1() and m_add() instead of open-coding everything, and
getting it slightly wrong.


Revision tags: agc-symver-base
# 1.141 14-Feb-2013 riastradh

Fix some screw cases in cmsg file descriptor passing.

- Don't leave garbage in the control buffer if allocating file
descriptors fails in unp_externalize.

- Scrub the space between CMSG_LEN and CMSG_SPACE to avoid kernel
memory disclosure in unp_externalize.

- Don't read past cmsg_len when closing file descriptors that
couldn't get delivered, in free_rights.

ok christos


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.140 06-Oct-2012 christos

Avoid crash dereferencing a NULL fp in fd_affix() in unp_externalize
caused by the sequence of passing two fd's with two sendmsg()'s,
then doing a read() and a recvmsg(). The read() calls dom_dispose()
which discards both messages in the mbuf, and sets the fp's in the
array to NULL. Linux dequeues only one message per read() so the
second recvmsg() gets the fd from the second message. This fix
just avoids the NULL pointer de-reference, making the second
recvmsg() to fail. It is dubious to pass fd's with stream sockets
and expect mixing read() and recvmsg() to work. Plus processing
one control message per read() changes the current semantics and
should be examined before applied. In addition there is a race between
dom_externalize() and dom_dispose(): what happens in a multi-threaded
network stack when one thread disposes where the other externalizes
the same array?

NB: Pullup to 6.


# 1.139 30-Jul-2012 christos

branches: 1.139.2;
remove infinite loop on error, extra parens on return.


# 1.138 30-Jul-2012 christos

simplify unp_externalize(), some from gimpy, some from me.


# 1.137 02-Jun-2012 martin

Stopgap fix for PR kern/46463: disallow passing of kqueue descriptors
via SCM_RIGHT anxiliary socket messages.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.136 26-Jun-2011 christos

branches: 1.136.2; 1.136.8;
* Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).

- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.

* Fix the system calls that take socklen_t arguments to actually do so.

* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).

* Bump libc version for the new syscalls.


# 1.135 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.134 29-May-2011 manu

branches: 1.134.2;
Add SOCK_SEQPACKET to PL_LOCAL sockets. Based on patch from Jesse Off,
submitted 8 years ago:
http://mail-index.netbsd.org/tech-kern/2003/04/14/0006.html


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.133 19-Nov-2010 dholland

branches: 1.133.2;
Introduce struct pathbuf. This is an abstraction to hold a pathname
and the metadata required to interpret it. Callers of namei must now
create a pathbuf and pass it to NDINIT (instead of a string and a
uio_seg), then destroy the pathbuf after the namei session is
complete.

Update all namei call sites accordingly. Add a pathbuf(9) man page and
update namei(9).

The pathbuf interface also now appears in a couple of related
additional places that were passing string/uio_seg pairs that were
later fed into NDINIT. Update other call sites accordingly.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3
# 1.132 21-Oct-2010 yamt

unp_connect: fix an assertion


# 1.131 21-Oct-2010 yamt

unp_connect2: fix a comment.


Revision tags: yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.130 24-Jun-2010 hannken

Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.129 09-Feb-2010 wiz

branches: 1.129.2;
Fix typo in comment.


Revision tags: uebayasi-xip-base
# 1.128 08-Jan-2010 pooka

branches: 1.128.2;
The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change


Revision tags: matt-premerge-20091211 yamt-nfs-mp-base8 jym-xensuspend-nbase
# 1.127 26-Aug-2009 bouyer

In uipc_usrreq(PRU_ACCEPT), grab the unp_streamlock before unp_setpeerlocks().
This fixes a race where, for a short period of time, so->so_lock and
so2->so_lock are not sync. This makes solocked2() and solocked()
unreliable and cause DIAGNOSTIC kernel panics. This also fixes a possible
panic in unp_setaddr() which expects the socket locked.
Should fix kern/38968, fix proposed in
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005863.html


Revision tags: yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.126 24-May-2009 ad

More changes to improve kern_descrip.c.

- Avoid atomics in more places.
- Remove the per-descriptor mutex, and just use filedesc_t::fd_lock.
It was only being used to synchronize close, and in any case we needed
to take fd_lock to free the descriptor slot.
- Optimize certain paths for the <NDFDFILE case.
- Sprinkle more comments and assertions.
- Cache more stuff in filedesc_t.
- Fix numerous minor bugs spotted along the way.
- Restructure how the open files array is maintained, for clarity and so
that we can eliminate the membar_consumer() call in fd_getfile(). This is
mostly syntactic sugar; the main functional change is that fd_nfiles now
lives alongside the open file array.

Some measurements with libmicro:

- simple file syscalls are like close() are between 1 to 10% faster.
- some nice improvements, e.g. poll(1000) which is ~50% faster.


Revision tags: yamt-nfs-mp-base4 yamt-nfs-mp-base3 jym-xensuspend-base
# 1.125 04-May-2009 yamt

tweak some assertions on so_head to make them more meaningful.


Revision tags: nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.124 09-Apr-2009 yamt

0 -> NULL


# 1.123 09-Apr-2009 yamt

remove an unnecessary cast.


# 1.122 09-Apr-2009 yamt

0 -> NULL where appropriate


# 1.121 11-Mar-2009 mrg

completely rework the way that orphaned sockets that are being fdpassed
via SCM_RIGHTS messages are dealt with:

1. unp_gc: make this a kthread.

2. unp_detach: go not call unp_gc directly. instead, wake up unp_gc kthread.

3. unp_scan: do not close files here. instead, put them on a global list
for unp_gc to close, along with a per-file "deferred close count". if
file is already enqueued for close, just increment deferred close count.
this eliminates the recursive calls.

3. unp_gc: scan files on global deferred close list. close each file N
times, as specified by deferred close count in file. continue processing
list until it becomes empty (closing may cause additional files to be
queued for close).

4. unp_gc: add additional bit to mark files we are scanning. set during
initial scan of global file list that currently clears FMARK/FDEFER.
during later scans, never examine / garbage collect descriptors that
we have not marked during the earlier scan. do not proceed with this
initial scan until all deferred closes have been processed. be careful
with locking to ensure no races are introduced between deferred close
and file scan.

5. unp_gc: use dummy file_t to mark position in list when scanning. allow
us to drop filelist_lock. in turn allows us to eliminate kmem_alloc()
and safely close files, etc.

6. prohibit transfer of descriptors within SCM_RIGHTS messages if
(num_files_in_transit > maxfiles / unp_rights_ratio)

7. fd_allocfile: ensure recycled filse don't get scanned.


this is 97% work done by andrew doran, with a couple of minor bug fixes
and a lot of testing by yours truly.


Revision tags: nick-hppapmap-base2
# 1.120 08-Feb-2009 pooka

branches: 1.120.2;
Don't try to fd_putfile() descriptors we didn't manage to fd_getfile().

Fixes local DoS panic described in kern/40570.


Revision tags: netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.119 11-Oct-2008 pooka

branches: 1.119.2; 1.119.4;
Move uidinfo to its own module in kern_uidinfo.c and include in rump.
No functional change to uidinfo.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.118 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.117 20-Jun-2008 christos

branches: 1.117.2;
Also enforce that cm->cmsg_len >= CMSG_ALIGN(sizeof cmsghdr), from
Michael van Elst


# 1.116 20-Jun-2008 christos

Don't require cm->cmsg_len == control->m_len, just that the cm->cmsg_len
<= control->m_len, like FreeBSD does. Idea from Taylor R Campbell.


Revision tags: yamt-pf42-base4
# 1.115 10-Jun-2008 ad

There can be existing waiters on a socket's condition variables when we
change socket::so_lock, and they rely on the old lock to synchronize.
Wake them up whenever we change so_lock so they can restart their waits.


Revision tags: yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.114 28-Apr-2008 martin

branches: 1.114.2; 1.114.4;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.113 27-Apr-2008 ad

branches: 1.113.2;
Add a comment.


# 1.112 24-Apr-2008 ad

Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.111 20-Apr-2008 mlelstv

When unp_internalize fails (due to the sanity check or an out-of-memory
condition), it leaves the control message with file descriptors. Calling
unp_dispose() will interpret the message as containing file pointers
and crash the system.
This change removes unp_dispose() from this failure path and avoids
using goto to jump into switch statements...
The previous workaround to ignore such messages in unp_scan() is removed.


# 1.110 19-Apr-2008 mjf

If cm->cmsg_len is not valid for unp_internalize do not use it to work out
where the data is in unp_scan.

Fixes PR/38391


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.109 28-Mar-2008 ad

branches: 1.109.2;
Prevent overlapping calls to bind() and/or connect() on a Unix socket.


Revision tags: ad-socklock-base1
# 1.108 24-Mar-2008 yamt

merge yamt-lazymbuf branch.


Revision tags: yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.107 21-Mar-2008 rmind

unp_gc: unlock filelist_lock in a case of restart.


# 1.106 21-Mar-2008 ad

Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.105 25-Jan-2008 ad

branches: 1.105.6;
Remove VOP_LEASE. Discussed on tech-kern.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.104 05-Jan-2008 dsl

Use FILE_LOCK() and FILE_UNLOCK()


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base jmcneill-pm-base
# 1.103 08-Dec-2007 pooka

branches: 1.103.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.


Revision tags: vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase reinoud-bufcleanup-base
# 1.102 26-Nov-2007 pooka

branches: 1.102.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern


Revision tags: jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 vmlocking-base
# 1.101 08-Oct-2007 ad

branches: 1.101.4;
Merge file descriptor locking, cwdi locking and cross-call changes
from the vmlocking branch.


Revision tags: yamt-x86pmap-base2 yamt-x86pmap-base
# 1.100 19-Sep-2007 dyoung

branches: 1.100.2;
Make uipc_ctloutput() return ENOPROTOOPT instead of EINVAL when it
is passed a handle socket-option level that it does not care about.


Revision tags: nick-csl-alignment-base5
# 1.99 09-Aug-2007 he

branches: 1.99.2;
Add a new socket option for unix domain sockets: LOCAL_PEEREID, to make
it possible to get the pid, euid and egid of the process at the remote
end at the time it did bind() or connect().

Add a new libc function, getpeereid() to easily get at the euid and egid.
As a consequence, bump libc's minor number.

Document the LOCAL_PEEREID socket option in unix(4).

Based on contribution by Arne H. Juul, minor modifications by myself.


Revision tags: matt-mips64-base
# 1.98 03-Aug-2007 martin

branches: 1.98.2;
PR kern/32842:
do not leak file descriptors when sending a datagram with SCM_RIGHTS
fails. Patch from Gary Thorpe, based on changes in FreeBSD and work
from Christian Biere.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.97 22-Apr-2007 dsl

branches: 1.97.2; 1.97.6;
Change the way that emulations locate files within the emulation root to
avoid having to allocate space in the 'stackgap'
- which is very LWP unfriendly.
The additional code for non-emulation namei() is trivial, the reduction for
the emulations is massive.
The vnode for a processes emulation root is saved in the cwdi structure
during process exec.
If the emulation root the TRYEMULROOT flag are set, namei() will do an initial
search for absolute pathnames in the emulation root, if that fails it will
retry from the normal root.
".." at the emulation root will always go to the real root, even in the middle
of paths and when expanding symlinks.
Absolute symlinks found using absolute paths in the emulation root will be
relative to the emulation root (so /usr/lib/xxx.so -> /lib/xxx.so links
inside the emulation root don't need changing).
If the root of the emulation would be returned (for an emulation lookup), then
the real root is returned instead (matching the behaviour of emul_lookup,
but being a cheap comparison here) so that programs that scan "../.."
looking for the root dircetory don't loop forever.
The target for symbolic links is no longer mangled (it used to get the
CHECK_ALT_xxx() treatment, so could get /emul/xxx prepended).
CHECK_ALT_xxx() are no more. Most of the change is deleting them, and adding
TRYEMULROOT to the flags to NDINIT().
A lot of the emulation system call stubs could now be deleted.


Revision tags: thorpej-atomic-base
# 1.96 03-Apr-2007 hannken

Remove calls to now obsolete vn_start_write() and vn_finished_write().


# 1.95 04-Mar-2007 christos

branches: 1.95.2; 1.95.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.94 01-Nov-2006 cbiere

branches: 1.94.2; 1.94.4; 1.94.8;
Pointing one element past an array is fine, pointing before it isn't.


Revision tags: yamt-splraiseipl-base2 yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.93 03-Sep-2006 christos

branches: 1.93.2; 1.93.4;
use c99 initializers


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base8 yamt-pdpolicy-base7
# 1.92 23-Jul-2006 ad

Use the LWP cached credentials where sane.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.91 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.90 14-Apr-2006 christos

Coverity CID 1089: Add more KASSERTs to prevent NULL deref.


# 1.89 14-Apr-2006 christos

Coverity CID 1088: Add KASSERT to prevent NULL pointer deref.


# 1.88 13-Apr-2006 matt

Add a KASSERT to document a condition for the PRU_ABORT case.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.87 01-Mar-2006 christos

branches: 1.87.2; 1.87.4; 1.87.6;
PR/32856: Christian Biere: Don't panic if you send a control message with
SCM_RIGHTS on an unconnected stream socket.


# 1.86 11-Dec-2005 christos

branches: 1.86.2; 1.86.4; 1.86.6;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base ktrace-lwp-base
# 1.85 11-Nov-2005 simonb

Call nanotime() directly, instead of doing the
microtime()/TIMEVAL_TO_TIMESPEC() dance.


Revision tags: yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.84 30-Aug-2005 jmmv

Honor the user's umask while creating local sockets. Several other systems
do already this (such as FreeBSD, OpenBSD and Linux), so it will improve
portability of some third-party programs. No objections in tech-kern@.


# 1.83 16-Jun-2005 yamt

branches: 1.83.2;
uipc_usrreq: plug mbuf leak.


# 1.82 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


# 1.81 07-May-2005 christos

PR/30154: YAMAMOTO Takashi: tcp_close locking botch
chgsbsize() as mentioned in the PR can be called from an interrupt context
via tcp_close(). Avoid calling uid_find() in chgsbsize().
- Instead of storing so_uid in struct socketvar, store *so_uidinfo
- Add a simple lock to struct uidinfo.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.80 26-Feb-2005 perry

branches: 1.80.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.79 03-Sep-2004 darrenr

branches: 1.79.4; 1.79.6;
add a per-socket counter for dropped UDP packets when the internal buffers
are full.


# 1.78 22-May-2004 jonathan

Eliminate several uses of `curproc' from the socket-layer code and from NFS.

Add a new explicit `struct proc *p' argument to socreate(), sosend().
Use that argument instead of curproc. Follow-on changes to pass that
argument to socreate(), sosend(), and (*so->so_send)() calls.
These changes reviewed and independently recoded by Matt Thomas.

Changes to soreceive() and (*dom->dom_exernalize() from Matt Thomas:
pass soreceive()'s struct uio* uio->uio_procp to unp_externalize().
Eliminate curproc from unp_externalize. Also, now soreceive() uses
its uio->uio_procp value, pass that same value downward to
((pr->pru_usrreq)() calls for consistency, instead of (struct proc * )0.

Similar changes in sys/nfs to eliminate (most) uses of curproc,
either via the req-> r_procp field of a struct nfsreq *req argument,
or by passing down new explicit struct proc * arguments.

Reviewed by: Matt Thomas, posted to tech-kern.
NB: The (*pr->pru_usrreq)() change should be tested on more (all!) protocols.


# 1.77 18-Apr-2004 matt

Constify sun_noname.


# 1.76 18-Apr-2004 matt

ANSI'fy.


# 1.75 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.74 23-Mar-2004 junyoung

branches: 1.74.2; 1.74.4;
Nuke __P().


# 1.73 29-Dec-2003 martin

Avoid using m_clget() on a mbuf already in use, especially when we
need the data in the mbuf later and m_clget() changes some fields
overlaid to regular mbuf data. Instead, rearange code a bit, create
data into a new allocated buffer and and use MEXTADD to attach it to
the mbuf, if the mbuf internal space is not sufficient.

This fixes a crash on sparc64 (and probably all other archs where
sizeof(int) != sizeof(struct file *)) when running
regress/sys/kern/unfdpass.

Idea for solution from Matt Thomas, with additional input from YAMAMOTO
Takashi.


# 1.72 29-Nov-2003 matt

Restore a change that made AF_LOCAL sockets block on connect(2) until
accepted. However, this time this behavor is not the default. Instead
it must enabled by using the LOCAL_CONNWAIT socket option on either the
connecting or accepting socket.


# 1.71 29-Nov-2003 perry

Revert a change that altered the semantics of AF_LOCAL sockets. Sadly
this made us API incompatible with other Unixes.


# 1.70 15-Oct-2003 hannken

Add the gating of system calls that cause modifications to the underlying
file system.
The function vfs_write_suspend stops all new write operations to a file
system, allows any file system modifying system calls already in progress
to complete, then sync's the file system to disk and returns. The
function vfs_write_resume allows the suspended write operations to
complete.

From FreeBSD with slight modifications.

Approved by: Frank van der Linden <fvdl@netbsd.org>


# 1.69 03-Sep-2003 matt

Fix typo.


# 1.68 03-Sep-2003 matt

Change the behavor of AF_LOCAL connect() to sleep until the server has
accepted the connection. This can prevent a client from overwhelming a
server.


# 1.67 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.66 24-Jul-2003 jdolecek

back rev 1.63 (the linux hack) off - no compat specific code
in generic code, please

we need to massage the passed linux cmsg anyway, linux uses different
alignment for CMSG_DATA on at least some architectures


# 1.65 23-Jul-2003 itojun

backout previous, there was a comment on LINUX_SOL_SOCKET=1


# 1.64 23-Jul-2003 itojun

#define LINUX_SOL_SOCKET 1, so that we can answer "what the hell is this 1?"
at ease.


# 1.63 23-Jul-2003 christos

From Todd Vierling: Accept level == 1 for linux compat.


# 1.62 29-Jun-2003 fvdl

branches: 1.62.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.61 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.60 10-Apr-2003 christos

RP/21088: Jesse Off: Return ENOBUFS instead of EINVAL when sbappend fails.


# 1.59 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


# 1.58 25-Feb-2003 pk

Fix a simple_lock() mismatch in unp_internalize().
We may need to merge the passes over the files contained in the message
as noted by enami tsugutomo on tech-smp.


# 1.57 23-Feb-2003 pk

Make updating a file's reference and use count MP-safe.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.56 25-Nov-2002 itojun

branches: 1.56.2;
no need for error check after MEXTMALLOC - jdolecek


# 1.55 25-Nov-2002 itojun

MEXTMALLOC() can fail even if M_WAITOK, if arg is too big for malloc().


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.54 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base gehenna-devsw-base eeh-devprop-base newlock-base ifpoll-base thorpej-mips-cache-base
# 1.53 12-Nov-2001 lukem

add RCSIDs


# 1.52 18-Oct-2001 thorpej

branches: 1.52.2;
Deprecate the "m_act" alias of "m_nextpkt" (m_act is a historical
name), and just use m_nextpkt everywhere.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.51 14-Jun-2001 thorpej

branches: 1.51.2;
Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.


# 1.50 07-Jun-2001 thorpej

Rework fdalloc() even further: split fdalloc() into fdalloc() and
fdexpand(). The former will return ENOSPC if there is not space
in the current filedesc table. The latter performs the expansion
of the filedesc table. This means that fdalloc() won't ever block,
and it gives callers an opportunity to clean up before the
potentially-blocking fdexpand() call.

Update all fdalloc() callers to deal with the need-to-fdexpand() case.

Rewrite unp_externalize() to use fdalloc() and fdexpand() in a
safe way, using an algorithm suggested by Bill Sommerfeld:
- Use a temporary array of integers to hold the new filedesc table
indexes. This allows us to repeat the loop if necessary.
- Loop through the array of file *'s, assigning them to filedesc table
slots. If fdalloc() indicates expansion is necessary, undo the
assignments we've done so far, expand, and retry the whole process.
- Once all file *'s have been assigned to slots, update the f_msgcount
and unp_rights counters.
- Right before we return, copy the temporary integer array to the message
buffer, and trim the length as before.
Note that once locking is added to the filedesc array, this entire
operation will be `atomic', in that the lock will be held while
file *'s are assigned to embryonic table slots, thus preventing anything
else from using them.


# 1.49 06-Jun-2001 thorpej

Change fdalloc() to return ERESTART if we had to reallocate the
descriptor array, which may have blocked. Change callers of
fdalloc() to restart whatever they\'re doing if this condition
happens. (XXX unp_externalize() needs some work, but that will
be tackled later.)

Change finishdup() to close the descriptor in the `new\' slot if
one exists, and change sys_dup2() accordingly.

Closes a race condition when using kernel-assisted user threads.

While here, garbage-collect UF_MAPPED -- it is not used anywhere.


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base thorpej_scsipi_base
# 1.48 05-Jun-2000 thorpej

branches: 1.48.2; 1.48.4;
Oops, missed a couple of places where CMSG_*() should be used. No
functional change in this case, but the code is now correct.


# 1.47 05-Jun-2000 thorpej

- Fix file descriptor passing AGAIN. This has apparently been broken
on LP64 systems (and probably the SPARC) since the __cmsg_alignbytes()
changes went in.
- Change file descriptor passing to use CMSG_DATA(), not (cm + 1). This
pretty much has to be done in order to make it work properly on LP64,
and considering that it's been broken this long...
- Use CMSG_SPACE() to determine the mbuf length needed for a given
control message, and CMSG_LEN() to stash in the cmsg_len member.


Revision tags: minoura-xpg4dl-base
# 1.46 30-Mar-2000 augustss

branches: 1.46.2;
Get rid of register declarations.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base chs-ubc2-base
# 1.45 17-Jun-1999 thorpej

branches: 1.45.2;
Um, hi, let's initialize pointers before we use them.


# 1.44 05-May-1999 thorpej

Add "use counting" to file entries. When closing a file, and it's reference
count is 0, wait for use count to drain before finishing the close.

This is necessary in order for multiple processes to safely share file
descriptor tables.


# 1.43 05-May-1999 thorpej

Fix alignment problem in the garbage-collection code path.


# 1.42 30-Apr-1999 thorpej

Break cdir/rdir/cmask info out of struct filedesc, and put it in a new
substructure, `cwdinfo'. Implement optional sharing of this substructure.

This is required for clone(2).


# 1.41 21-Apr-1999 mrg

revert previous. oops.


# 1.40 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: netbsd-1-4-base
# 1.39 22-Mar-1999 sommerfe

branches: 1.39.2;
Disallow descriptor-passing of descriptors which are open on
directories which aren't under the recipient's root.

Clean up of many error conditions involving descriptor passing, to
eliminate infinite loops, panics, premature garbage collection of
sockets, and descriptor leaks:
- Avoid letting unp_gc() see descriptors with a refcount of zero by
removing them from the socket's queue before releasing them.
- Avoid socket leak in PRU_ABORT (this will also gc descriptors queued
on a not-yet accepted socket when the accepting socket goes away).
- Put in block comment explaining how unp_gc() should work.
- Correctly manage unp_defer count so we don't get stuck in an infinite
loop with nothing to do.
- Don't tie MARK and DEFER bits so closely together.
- Mark descriptors queued on not-yet-accepted sockets as well.
- Don't call sorflush on non-socket, it doesn't work very well.
- Deal with discard of NULL file pointer.
- Hopefully cause GC to converge faster by only deferring sockets in
unp_mark().


# 1.38 21-Dec-1998 thorpej

In unp_internalize(), add a comment explaining why we must ALIGN() the
data after the cmsghdr when accessing internalized SCM_RIGHTS messages
(i.e. array of struct file *s). The historic interface does not align
the externalized SCM_RIGHTS messages (i.e. array of ints).


# 1.37 21-Dec-1998 thorpej

Fix a fencepost error in unp_scan() which caused a bad pointer deref on
the SPARC platform only (ILP32 but ALIGNBYTES of 7), due to a missing
ALIGN().


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.36 04-Aug-1998 perry

Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)


# 1.35 31-Jul-1998 perry

fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.


Revision tags: eeh-paddr_t-base
# 1.34 18-Jul-1998 lukem

branches: 1.34.2;
use AF_LOCAL instead of AF_UNIX


# 1.33 16-Jul-1998 thorpej

Back out previous, I botched something.


# 1.32 10-Jul-1998 thorpej

For SOCK_STREAM, provide the socket credentials to the accepter as soon as
the client connects.


# 1.31 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.30 07-Jan-1998 thorpej

Implement passing credentials as ancillary data on Unix domain sockets,
enabled with the LOCAL_CREDS socket option on the listener. Semantics are
similar to BSD/OS's:
- Creds are available with first data on SOCK_STREAM, and with every datagram
on SOCK_DGRAM.
- It is not possible to forge credentials.

Different in that:
- Different credential data structure (ours does not rely on the format
of internal kernel data structures, and does not pass the login name).
- We can pass creds and file descriptors at the same time (this does not
work in BSD/OS).

Luke Mewburn <lukem@netbsd.org> gets credit for inspiring me to implement
this. :-)


# 1.29 07-Jan-1998 thorpej

Fix passing of multiple file descriptors (was broken when code was made
64-bit safe).


Revision tags: netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.28 17-Oct-1997 christos

branches: 1.28.2;
PR/4280: Chris Jones: Sending more than one fd over AF_UNIX sockets causes
panic. Bug in the fd -> struct file * conversion...


Revision tags: thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.27 26-Jun-1997 thorpej

Several small changes to eliminate kludginess in dealing with unix domain
socket names:
- In unp_setsockaddr() and unp_setpeeraddr(), if the socket name can't
fit into a single mbuf, allocate enough external storage space to
hold it.
- In unp_bind() and unp_connect(), perform a similar operation, but allocate
one extra byte, and ensure that the pathname is nul-terminated.
Many thanks to enami tsugutomo <enami@cv.sony.co.jp> for the sanity
checking.


# 1.26 24-Jun-1997 thorpej

Eliminate use of dtom() in the handing of UNIX domain sockets. Add an
"unp_addrlen" member to the unpcb, and use it when copying the socket
name. This eliminates that last uses of dtom() in the system.


# 1.25 15-May-1997 kleink

When fstat(2)ing a file descriptor of a local communications domain socket,
fill the socket's creation time into the stat structure's st_[acm]time fields:
POSIX requires this behavior for pipe(2). N.B.: updating the st_[am]time fields
when reading/writing the pipe is neither required nor implemented, though.


# 1.24 10-Apr-1997 cgd

Internalize and externalize file descriptors being passed via local domain
socket control messages correctly, without assuming that sizeof(int) ==
sizeof(pointer). Fixes PR#3183.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.23 23-May-1996 mycroft

Oops. Add missing label.


# 1.22 23-May-1996 mycroft

We can only get a control mbuf for PRU_SEND or PRU_SENDOOB. Add diagnostic
code to panic in this case.


# 1.21 23-May-1996 mycroft

Make sure the control and data mbufs are freed in all cases.


# 1.20 23-May-1996 mycroft

Separate some code into separate functions.
Make unp_addr be a pointer to the sockaddr, not to the mbuf, as with raw
sockets.
Other minor cleanup.


# 1.19 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.18 09-Feb-1996 christos

branches: 1.18.4;
More proto fixes


# 1.17 04-Feb-1996 pk

unp_detach() return type botch.


# 1.16 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.15 17-Aug-1995 mycroft

so_pcb should be a void *.


# 1.14 16-Aug-1995 mycroft

Allocate PCBs with malloc(), more mgetclr(). Be more careful to free the
PCB after it's done with.


# 1.13 05-Apr-1995 mycroft

Add missing argument to closef().


# 1.12 13-Dec-1994 mycroft

LEASE_CHECK -> VOP_LEASE


# 1.11 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.10 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.9 08-Jun-1994 mycroft

Update to 4.4-Lite fs code.


# 1.8 04-May-1994 mycroft

Fix panic when closing a file descriptor on which access rights have been sent
but not received.


# 1.7 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.6 14-Sep-1993 mycroft

Fix from Noriyuki Soda <soda@sra.co.jp>:
recvmsg(2) always returns -1 with errno==EMSGSIZE, when trying
to pass file descriptors through UNIX domain socket.


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.5 27-Jun-1993 andrew

branches: 1.5.4;
ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.4 12-Jun-1993 andrew

Yuval Yarom's 'panic:closef: count < 0' fix to unp_discard().


# 1.3 22-May-1993 cgd

add include of select.h if necessary for protos, or delete if extraneous


# 1.2 18-May-1993 cgd

make kernel select interface be one-stop shopping & clean it all up.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


Revision tags: thorpej-i2c-spi-conf2-base
# 1.201 08-Aug-2021 nia

introduce a SOL_LOCAL for unix-domain socket level socket options
as an alias of the current 0 used for these options, as in FreeBSD.

reviewed by many.


Revision tags: thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.200 06-Nov-2020 christos

PR/55777: Ruslan Nikolaev: Move the unp_sysctl_create to uipc_usrreq.c to
facilitate splitting rump modules and does not require a dummy function.


# 1.199 26-Aug-2020 christos

branches: 1.199.2;
add socket info for user and group for unix sockets in fstat.


Revision tags: bouyer-xenpvh-base2
# 1.198 21-Apr-2020 ad

Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.


Revision tags: phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.197 23-Feb-2020 ad

branches: 1.197.4;
Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


# 1.196 01-Feb-2020 riastradh

Load struct fdfile::ff_file with atomic_load_consume.

Exceptions: when we're only testing whether it's there, not about to
dereference it.

Note: We do not use atomic_store_release to set it because the
preceding mutex_exit should be enough.

(That said, it's not clear the mutex_enter/exit is needed unless
refcnt > 0 already, in which case maybe it would be a win to switch
from the membar implied by mutex_enter to the membar implied by
atomic_store_release -- which I would generally expect to be much
cheaper. And a little clearer without a long comment.)


# 1.195 01-Feb-2020 riastradh

Load struct filedesc::fd_dt with atomic_load_consume.

Exceptions: when fd_refcnt <= 1, or when holding fd_lock.

While here:

- Restore KASSERT(mutex_owned(&fdp->fd_lock)) in fd_unused.
=> This is used only in fd_close and fd_abort, where it holds.
- Move bounds check assertion in fd_putfile to where it matters.
- Store fd_dt with atomic_store_release.
- Move load of fd_dt under lock in knote_fdclose.
- Omit membar_consumer in fdesc_readdir.
=> atomic_load_consume serves the same purpose now.
=> Was needed only on alpha anyway.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 ad-namecache-base2 ad-namecache-base1 ad-namecache-base netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base
# 1.194 29-Jul-2019 maxv

branches: 1.194.2; 1.194.4;
Fix info leak: the padding after the header causes uninitialized heap
memory to be copied to userland in sys_recvmsg().


Revision tags: phil-wifi-20190609
# 1.193 03-Jun-2019 msaitoh

Fix typo in comment (s/seperate/separate/).


Revision tags: isaki-audio2-base
# 1.192 01-Mar-2019 pgoyette

Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.

NFCI intended.

Ride the earlier kernel bump - it;s getting crowded.


# 1.191 20-Feb-2019 pgoyette

compat70_ocreds_valid is not a pointer to a boolean, it is the boolean
itself which controls whether or not we recognize the OCRED options.

Should fix the panic identified in PR kern/53991 (awaiting confirmation
from submitter).


# 1.190 04-Feb-2019 mrg

add or adjust fallthru comments.


# 1.189 29-Jan-2019 pgoyette

Normalize all the compat hooks' names to the form

<subsystem>_<function>_<version>_hook

NFCI

XXX Note that although this introduces a change in the kernel-to-
XXX module interface, we are NOT bumping the kernel version number.
XXX We will bump the version number once the interface stabilizes.


# 1.188 27-Jan-2019 pgoyette

Merge the [pgoyette-compat] branch


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.187 08-Nov-2018 roy

Don't call soroverflow when we return the error to the sender.

Thanks to thorpej@ for a sanity check.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.186 11-May-2018 roy

branches: 1.186.2;
Increase the default size of some receive buffers from 8k to 16k.
This mitigates recent reports of socket overflow errors
and fixes PR bin/53247.


# 1.185 05-May-2018 christos

bump PIPSIZ from 4 to 8K like FreeBSD and provide the same sysctls


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.184 19-Mar-2018 roy

socket: report receive buffer overflows

Add soroverflow() which increments the overflow counter, sets so_error
to ENOBUFS and wakes the receive socket up.
Replace all code that manually increments this counter with soroverflow().
Add soroverflow() to raw_input().

This allows userland to detect route(4) overflows so it can re-sync
with the current state.


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.183 17-Feb-2018 christos

branches: 1.183.2;
fix LOCAL_PEEREID to not return the same info for both sides...
XXX: pullup-{7,8}


Revision tags: tls-maxphys-base-20171202
# 1.182 02-Dec-2017 mrg

include opt_compat_netbsd.h, so that eg COMPAT_70 will be set.


Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.181 31-Oct-2016 maxv

branches: 1.181.8;
Memory leak, found by Mootja. It is easily triggerable from userland.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.180 06-Apr-2016 roy

branches: 1.180.2;
Add sc_pid to sockcred so that SOCK_DGRAM and LOCAL_CREDS socket option
can learn the process id of the AF_LOCAL sender.
Add compat glue for old structure.


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606
# 1.179 02-May-2015 rtr

make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}


# 1.178 26-Apr-2015 rtr

remove pr_generic from struct pr_usrreqs and all implementations of
pr_generic in protocols.

bump to 7.99.13

approved by rmind@


# 1.177 24-Apr-2015 rtr

make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19


Revision tags: nick-nhusb-base-20150406
# 1.176 03-Apr-2015 rtr

* change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@


# 1.175 01-Mar-2015 christos

PR/39918: ITOH Yasufumi: Replace KASSERT with continue, since the file
descriptor can be closed since closef() does not pay attention to FDEFER.
XXX: Pullup-7


# 1.174 28-Feb-2015 rtr

Place opening brace to function at column 0, like in the rest of the file.


# 1.173 02-Feb-2015 christos

Handle LOCAL_PEERID for socketpair() connected sockets which connect through
connect2().
1. move the code that sets the peerid structure into connect1(). This
handles so2. The datagram code calls connect2 twice with flipped
so arguments so both sockets get set.
2. in connect2 copy the peerid structure from so2 to so, so that that
both stream sockets get set.


Revision tags: nick-nhusb-base
# 1.172 08-Oct-2014 taca

branches: 1.172.2;
Make behavior of getsockname(2) (and maybe getpeername(2)) as the same as
NetBSD 6.1_STABLE and other operating system (OS X 10.9.5).

* sa_len of sockaddr_un strucrure is always set to sizeof(sun_path).
* pathname stored in sun_path is alwasys '\0' terminated (except length
of sun_path is sizeof(sun_path)?).

Should be fix PR kern/49247, runtime problem of lmtp service of dovecot2 on
NetBSD current and NetBSD 7.0_BETA.


# 1.171 05-Sep-2014 matt

Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.


# 1.170 05-Sep-2014 matt

Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 09-Aug-2014 rtr

branches: 1.169.2;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@


# 1.168 08-Aug-2014 rtr

split PRU_RCVD function out of pr_generic() usrreq switches and put into
separate functions

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_RCVD with calls to
pr_rcvd()


# 1.167 05-Aug-2014 rtr

actually use the passed in struct lwp *l instead of curlwp in unp_connect()


# 1.166 05-Aug-2014 rtr

split PRU_SEND function out of pr_generic() usrreq switches and put into
separate functions

xxx_send(struct socket *, struct mbuf *, struct mbuf *,
struct mbuf *, struct lwp *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_SEND with calls to
pr_send()

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_send() PRUs

- l2cap_send() -> l2cap_send_pcb()
- sco_send() -> sco_send_pcb()
- rfcomm_send() -> rfcomm_send_pcb()

patch reviewed by rmind


# 1.165 05-Aug-2014 rtr

revert the removal of struct lwp * parameter from bind, listen and connect
user requests.

this should resolve the issue relating to nfs client hangs presented
recently by wiz on current-users@


# 1.164 31-Jul-2014 rtr

* remove declarations of unp_bind, unp_discard, unp_disconnect1, unp_drop,
unp_shutdown1, unp_internalize and unp_output functions from sys/un.h
and instead declare them as static in uipc_usrreq.c with prototype
declarations as necessary.

* remove struct lwp * parameter from unp_output() while here and just
use curlwp instead.

as discussed with rmind


# 1.163 31-Jul-2014 rtr

split PRU_DISCONNECT, PRU_SHUTDOWN and PRU_ABORT function out of
pr_generic() usrreq switches and put into separate functions

xxx_disconnect(struct socket *)
xxx_shutdown(struct socket *)
xxx_abort(struct socket *)

- always KASSERT(solocked(so)) even if not implemented
- replace calls to pr_generic() with req =
PRU_{DISCONNECT,SHUTDOWN,ABORT}
with calls to pr_{disconnect,shutdown,abort}() respectively

rename existing internal functions used to implement above functionality
to permit use of the names for xxx_{disconnect,shutdown,abort}().

- {l2cap,sco,rfcomm}_disconnect() ->
{l2cap,sco,rfcomm}_disconnect_pcb()
- {unp,rip,tcp}_disconnect() -> {unp,rip,tcp}_disconnect1()
- unp_shutdown() -> unp_shutdown1()

patch reviewed by rmind


# 1.162 30-Jul-2014 rtr

split PRU_CONNECT function out of pr_generic() usrreq switches and put
into seaparate functions

xxx_listen(struct socket *, struct mbuf *)

- always KASSERT(solocked(so)) and KASSERT(nam != NULL)
- replace calls to pr_generic() with req = PRU_CONNECT with
pr_connect()
- rename existin {l2cap,sco,rfcomm}_connect() to
{l2cap,sco,rfcomm}_connect_pcb() respectively to permit
naming consistency with other protocols functions.
- drop struct lwp * parameter from unp_connect() and at_pcbconnect()
and use curlwp instead where appropriate.

patch reviewed by rmind


# 1.161 24-Jul-2014 rtr

split PRU_BIND and PRU_LISTEN function out of pr_generic() usrreq
switches and put into separate functions
xxx_bind(struct socket *, struct mbuf *)
xxx_listen(struct socket *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_BIND with call to
pr_bind()

- replace calls to pr_generic() with req = PRU_LISTEN with call to
pr_listen()

- drop struct lwp * parameter from at_pcbsetaddr(), in_pcbbind() and
unp_bind() and always use curlwp.

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_{bind,listen}() PRUs

- l2cap_{bind,listen}() -> l2cap_{bind,listen}_pcb()
- sco_{bind,listen}() -> sco_{bind,listen}_pcb()
- rfcomm_{bind,listen}() -> rfcomm_{bind,listen}_pcb()

patch reviewed by rmind

welcome to netbsd 6.99.48


# 1.160 23-Jul-2014 rtr

split PRU_SENDOOB and PRU_RCVOOB function out of pr_generic() usrreq
switches and put into separate functions
xxx_sendoob(struct socket *, struct mbuf *, struct mbuf *)
xxx_recvoob(struct socket *, struct mbuf *, int)

- always KASSERT(solocked(so)) even if request is not implemented

- replace calls to pr_generic() with req = PRU_{SEND,RCV}OOB with
calls to pr_{send,recv}oob() respectively.

there is still some tweaking of m_freem(m) and m_freem(control) to come
for consistency. not performed with this commit for clarity.

reviewed by rmind


# 1.159 09-Jul-2014 rtr

* split PRU_ACCEPT function out of pr_generic() usrreq switches and put
into a separate function xxx_accept(struct socket *, struct mbuf *)

note: future cleanup will take place to remove struct mbuf parameter
type and replace it with a more appropriate type.

patch reviewed by rmind


# 1.158 09-Jul-2014 rtr

* split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47


# 1.157 07-Jul-2014 rtr

* sprinkle KASSERT(solocked(so)); in all pr_stat() functions.
* fix remaining inconsistent struct socket parameter names.


# 1.156 06-Jul-2014 rtr

* split PRU_SENSE functionality out of xxx_usrreq() switches and place into
separate xxx_stat(struct socket *, struct stat *) functions.
* replace calls using pr_generic with req == PRU_SENSE with pr_stat().

further change will follow that cleans up the pattern used to extract the
pcb and test for its presence.

reviewed by rmind


# 1.155 01-Jul-2014 rtr

fix parameter types in pr_ioctl, called xx_control() functions and remove
abuse of pointer to struct mbuf type.

param2 changed to u_long type and uses parameter name 'cmd' (ioctl command)
param3 changed to void * type and uses parameter name 'data'
param4 changed to struct ifnet * and uses parameter name 'ifp'
param5 has been removed (formerly struct lwp *) and uses of 'l' have been
replaced with curlwp from curproc(9).

callers have had (now unnecessary) casts to struct mbuf * removed, called
code has had (now unnecessary) casts to u_long, void * and struct ifnet *
respectively removed.

reviewed by rmind@


# 1.154 22-Jun-2014 rtr

* split PRU_CONTROL functionality out of xxx_userreq() switches and place
into separate xxx_ioctl() functions.
* place KASSERT(req != PRU_CONTROL) inside xxx_userreq() as it is now
inappropriate for req = PRU_CONTROL in xxx_userreq().
* replace calls to pr_generic() with req = PRU_CONTROL with pr_ioctl().
* remove & fixup references to PRU_CONTROL xxx_userreq() function comments.
* fix various comments references for xxx_userreq() that mentioned
PRU_CONTROL as xxx_userreq() no longer handles the request.

a further change will follow to fix parameter and naming inconsistencies
retained from original code.

Reviewed by rmind@


# 1.153 08-Jun-2014 christos

Handle race where the server closed the socket between us 'connecting' and
sending data.


# 1.152 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.151 18-May-2014 rmind

Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.150 23-Jan-2014 hannken

branches: 1.150.2;
Change vnode operations create, mknod, mkdir and symlink to return
the resulting vnode *vpp unlocked.

Discussed on tech-kern@

Welcome to 6.99.30


# 1.149 17-Jan-2014 hannken

Change vnode operations create, mknod, mkdir and symlink to keep the
directory node dvp locked on return.

Discussed on tech-kern@

Welcome to 6.99.29


# 1.148 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.147 25-Oct-2013 martin

Eliminat an unused variable and simplify the KASSERT which used to use it


# 1.146 08-Oct-2013 christos

Centralize the sockaddr_un allocation code. Set sun_len appropriately so
that the address length returned is correct, not always 106. Note that
we do things slightly differently than linux and explain why. Unit-tests
to come.


# 1.145 08-Oct-2013 christos

- Instead of having accept(2) return a zero-filled sockaddr for the case
where accept(2) was called on a unix socket that called connect(2) and
then close(2), before the connection was accepted, return the empty
sockaddr_un.
- Fix the length of the empty sockaddr_un socket so that it reflects reality.


# 1.144 29-Aug-2013 rmind

Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.


# 1.143 01-Aug-2013 drochner

In unp_externalize, don't do anything if an SCM_RIGHTS control message
was sent with zero file descriptors in it. Otherwise, a zero-length
temporary storage would be allocated which triggers panic on DIAGNOSTIC
kernels (but is harmless for release kernels).
reviewed by Taylor R Campbell


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.142 27-Jun-2013 christos

branches: 1.142.2;
use sbcreatecontrol1() and m_add() instead of open-coding everything, and
getting it slightly wrong.


Revision tags: agc-symver-base
# 1.141 14-Feb-2013 riastradh

Fix some screw cases in cmsg file descriptor passing.

- Don't leave garbage in the control buffer if allocating file
descriptors fails in unp_externalize.

- Scrub the space between CMSG_LEN and CMSG_SPACE to avoid kernel
memory disclosure in unp_externalize.

- Don't read past cmsg_len when closing file descriptors that
couldn't get delivered, in free_rights.

ok christos


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.140 06-Oct-2012 christos

Avoid crash dereferencing a NULL fp in fd_affix() in unp_externalize
caused by the sequence of passing two fd's with two sendmsg()'s,
then doing a read() and a recvmsg(). The read() calls dom_dispose()
which discards both messages in the mbuf, and sets the fp's in the
array to NULL. Linux dequeues only one message per read() so the
second recvmsg() gets the fd from the second message. This fix
just avoids the NULL pointer de-reference, making the second
recvmsg() to fail. It is dubious to pass fd's with stream sockets
and expect mixing read() and recvmsg() to work. Plus processing
one control message per read() changes the current semantics and
should be examined before applied. In addition there is a race between
dom_externalize() and dom_dispose(): what happens in a multi-threaded
network stack when one thread disposes where the other externalizes
the same array?

NB: Pullup to 6.


# 1.139 30-Jul-2012 christos

branches: 1.139.2;
remove infinite loop on error, extra parens on return.


# 1.138 30-Jul-2012 christos

simplify unp_externalize(), some from gimpy, some from me.


# 1.137 02-Jun-2012 martin

Stopgap fix for PR kern/46463: disallow passing of kqueue descriptors
via SCM_RIGHT anxiliary socket messages.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.136 26-Jun-2011 christos

branches: 1.136.2; 1.136.8;
* Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).

- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.

* Fix the system calls that take socklen_t arguments to actually do so.

* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).

* Bump libc version for the new syscalls.


# 1.135 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.134 29-May-2011 manu

branches: 1.134.2;
Add SOCK_SEQPACKET to PL_LOCAL sockets. Based on patch from Jesse Off,
submitted 8 years ago:
http://mail-index.netbsd.org/tech-kern/2003/04/14/0006.html


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.133 19-Nov-2010 dholland

branches: 1.133.2;
Introduce struct pathbuf. This is an abstraction to hold a pathname
and the metadata required to interpret it. Callers of namei must now
create a pathbuf and pass it to NDINIT (instead of a string and a
uio_seg), then destroy the pathbuf after the namei session is
complete.

Update all namei call sites accordingly. Add a pathbuf(9) man page and
update namei(9).

The pathbuf interface also now appears in a couple of related
additional places that were passing string/uio_seg pairs that were
later fed into NDINIT. Update other call sites accordingly.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3
# 1.132 21-Oct-2010 yamt

unp_connect: fix an assertion


# 1.131 21-Oct-2010 yamt

unp_connect2: fix a comment.


Revision tags: yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.130 24-Jun-2010 hannken

Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.129 09-Feb-2010 wiz

branches: 1.129.2;
Fix typo in comment.


Revision tags: uebayasi-xip-base
# 1.128 08-Jan-2010 pooka

branches: 1.128.2;
The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change


Revision tags: matt-premerge-20091211 yamt-nfs-mp-base8 jym-xensuspend-nbase
# 1.127 26-Aug-2009 bouyer

In uipc_usrreq(PRU_ACCEPT), grab the unp_streamlock before unp_setpeerlocks().
This fixes a race where, for a short period of time, so->so_lock and
so2->so_lock are not sync. This makes solocked2() and solocked()
unreliable and cause DIAGNOSTIC kernel panics. This also fixes a possible
panic in unp_setaddr() which expects the socket locked.
Should fix kern/38968, fix proposed in
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005863.html


Revision tags: yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.126 24-May-2009 ad

More changes to improve kern_descrip.c.

- Avoid atomics in more places.
- Remove the per-descriptor mutex, and just use filedesc_t::fd_lock.
It was only being used to synchronize close, and in any case we needed
to take fd_lock to free the descriptor slot.
- Optimize certain paths for the <NDFDFILE case.
- Sprinkle more comments and assertions.
- Cache more stuff in filedesc_t.
- Fix numerous minor bugs spotted along the way.
- Restructure how the open files array is maintained, for clarity and so
that we can eliminate the membar_consumer() call in fd_getfile(). This is
mostly syntactic sugar; the main functional change is that fd_nfiles now
lives alongside the open file array.

Some measurements with libmicro:

- simple file syscalls are like close() are between 1 to 10% faster.
- some nice improvements, e.g. poll(1000) which is ~50% faster.


Revision tags: yamt-nfs-mp-base4 yamt-nfs-mp-base3 jym-xensuspend-base
# 1.125 04-May-2009 yamt

tweak some assertions on so_head to make them more meaningful.


Revision tags: nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.124 09-Apr-2009 yamt

0 -> NULL


# 1.123 09-Apr-2009 yamt

remove an unnecessary cast.


# 1.122 09-Apr-2009 yamt

0 -> NULL where appropriate


# 1.121 11-Mar-2009 mrg

completely rework the way that orphaned sockets that are being fdpassed
via SCM_RIGHTS messages are dealt with:

1. unp_gc: make this a kthread.

2. unp_detach: go not call unp_gc directly. instead, wake up unp_gc kthread.

3. unp_scan: do not close files here. instead, put them on a global list
for unp_gc to close, along with a per-file "deferred close count". if
file is already enqueued for close, just increment deferred close count.
this eliminates the recursive calls.

3. unp_gc: scan files on global deferred close list. close each file N
times, as specified by deferred close count in file. continue processing
list until it becomes empty (closing may cause additional files to be
queued for close).

4. unp_gc: add additional bit to mark files we are scanning. set during
initial scan of global file list that currently clears FMARK/FDEFER.
during later scans, never examine / garbage collect descriptors that
we have not marked during the earlier scan. do not proceed with this
initial scan until all deferred closes have been processed. be careful
with locking to ensure no races are introduced between deferred close
and file scan.

5. unp_gc: use dummy file_t to mark position in list when scanning. allow
us to drop filelist_lock. in turn allows us to eliminate kmem_alloc()
and safely close files, etc.

6. prohibit transfer of descriptors within SCM_RIGHTS messages if
(num_files_in_transit > maxfiles / unp_rights_ratio)

7. fd_allocfile: ensure recycled filse don't get scanned.


this is 97% work done by andrew doran, with a couple of minor bug fixes
and a lot of testing by yours truly.


Revision tags: nick-hppapmap-base2
# 1.120 08-Feb-2009 pooka

branches: 1.120.2;
Don't try to fd_putfile() descriptors we didn't manage to fd_getfile().

Fixes local DoS panic described in kern/40570.


Revision tags: netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.119 11-Oct-2008 pooka

branches: 1.119.2; 1.119.4;
Move uidinfo to its own module in kern_uidinfo.c and include in rump.
No functional change to uidinfo.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.118 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.117 20-Jun-2008 christos

branches: 1.117.2;
Also enforce that cm->cmsg_len >= CMSG_ALIGN(sizeof cmsghdr), from
Michael van Elst


# 1.116 20-Jun-2008 christos

Don't require cm->cmsg_len == control->m_len, just that the cm->cmsg_len
<= control->m_len, like FreeBSD does. Idea from Taylor R Campbell.


Revision tags: yamt-pf42-base4
# 1.115 10-Jun-2008 ad

There can be existing waiters on a socket's condition variables when we
change socket::so_lock, and they rely on the old lock to synchronize.
Wake them up whenever we change so_lock so they can restart their waits.


Revision tags: yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.114 28-Apr-2008 martin

branches: 1.114.2; 1.114.4;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.113 27-Apr-2008 ad

branches: 1.113.2;
Add a comment.


# 1.112 24-Apr-2008 ad

Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.111 20-Apr-2008 mlelstv

When unp_internalize fails (due to the sanity check or an out-of-memory
condition), it leaves the control message with file descriptors. Calling
unp_dispose() will interpret the message as containing file pointers
and crash the system.
This change removes unp_dispose() from this failure path and avoids
using goto to jump into switch statements...
The previous workaround to ignore such messages in unp_scan() is removed.


# 1.110 19-Apr-2008 mjf

If cm->cmsg_len is not valid for unp_internalize do not use it to work out
where the data is in unp_scan.

Fixes PR/38391


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.109 28-Mar-2008 ad

branches: 1.109.2;
Prevent overlapping calls to bind() and/or connect() on a Unix socket.


Revision tags: ad-socklock-base1
# 1.108 24-Mar-2008 yamt

merge yamt-lazymbuf branch.


Revision tags: yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.107 21-Mar-2008 rmind

unp_gc: unlock filelist_lock in a case of restart.


# 1.106 21-Mar-2008 ad

Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.105 25-Jan-2008 ad

branches: 1.105.6;
Remove VOP_LEASE. Discussed on tech-kern.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.104 05-Jan-2008 dsl

Use FILE_LOCK() and FILE_UNLOCK()


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base jmcneill-pm-base
# 1.103 08-Dec-2007 pooka

branches: 1.103.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.


Revision tags: vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase reinoud-bufcleanup-base
# 1.102 26-Nov-2007 pooka

branches: 1.102.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern


Revision tags: jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 vmlocking-base
# 1.101 08-Oct-2007 ad

branches: 1.101.4;
Merge file descriptor locking, cwdi locking and cross-call changes
from the vmlocking branch.


Revision tags: yamt-x86pmap-base2 yamt-x86pmap-base
# 1.100 19-Sep-2007 dyoung

branches: 1.100.2;
Make uipc_ctloutput() return ENOPROTOOPT instead of EINVAL when it
is passed a handle socket-option level that it does not care about.


Revision tags: nick-csl-alignment-base5
# 1.99 09-Aug-2007 he

branches: 1.99.2;
Add a new socket option for unix domain sockets: LOCAL_PEEREID, to make
it possible to get the pid, euid and egid of the process at the remote
end at the time it did bind() or connect().

Add a new libc function, getpeereid() to easily get at the euid and egid.
As a consequence, bump libc's minor number.

Document the LOCAL_PEEREID socket option in unix(4).

Based on contribution by Arne H. Juul, minor modifications by myself.


Revision tags: matt-mips64-base
# 1.98 03-Aug-2007 martin

branches: 1.98.2;
PR kern/32842:
do not leak file descriptors when sending a datagram with SCM_RIGHTS
fails. Patch from Gary Thorpe, based on changes in FreeBSD and work
from Christian Biere.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.97 22-Apr-2007 dsl

branches: 1.97.2; 1.97.6;
Change the way that emulations locate files within the emulation root to
avoid having to allocate space in the 'stackgap'
- which is very LWP unfriendly.
The additional code for non-emulation namei() is trivial, the reduction for
the emulations is massive.
The vnode for a processes emulation root is saved in the cwdi structure
during process exec.
If the emulation root the TRYEMULROOT flag are set, namei() will do an initial
search for absolute pathnames in the emulation root, if that fails it will
retry from the normal root.
".." at the emulation root will always go to the real root, even in the middle
of paths and when expanding symlinks.
Absolute symlinks found using absolute paths in the emulation root will be
relative to the emulation root (so /usr/lib/xxx.so -> /lib/xxx.so links
inside the emulation root don't need changing).
If the root of the emulation would be returned (for an emulation lookup), then
the real root is returned instead (matching the behaviour of emul_lookup,
but being a cheap comparison here) so that programs that scan "../.."
looking for the root dircetory don't loop forever.
The target for symbolic links is no longer mangled (it used to get the
CHECK_ALT_xxx() treatment, so could get /emul/xxx prepended).
CHECK_ALT_xxx() are no more. Most of the change is deleting them, and adding
TRYEMULROOT to the flags to NDINIT().
A lot of the emulation system call stubs could now be deleted.


Revision tags: thorpej-atomic-base
# 1.96 03-Apr-2007 hannken

Remove calls to now obsolete vn_start_write() and vn_finished_write().


# 1.95 04-Mar-2007 christos

branches: 1.95.2; 1.95.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.94 01-Nov-2006 cbiere

branches: 1.94.2; 1.94.4; 1.94.8;
Pointing one element past an array is fine, pointing before it isn't.


Revision tags: yamt-splraiseipl-base2 yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.93 03-Sep-2006 christos

branches: 1.93.2; 1.93.4;
use c99 initializers


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base8 yamt-pdpolicy-base7
# 1.92 23-Jul-2006 ad

Use the LWP cached credentials where sane.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.91 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.90 14-Apr-2006 christos

Coverity CID 1089: Add more KASSERTs to prevent NULL deref.


# 1.89 14-Apr-2006 christos

Coverity CID 1088: Add KASSERT to prevent NULL pointer deref.


# 1.88 13-Apr-2006 matt

Add a KASSERT to document a condition for the PRU_ABORT case.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.87 01-Mar-2006 christos

branches: 1.87.2; 1.87.4; 1.87.6;
PR/32856: Christian Biere: Don't panic if you send a control message with
SCM_RIGHTS on an unconnected stream socket.


# 1.86 11-Dec-2005 christos

branches: 1.86.2; 1.86.4; 1.86.6;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base ktrace-lwp-base
# 1.85 11-Nov-2005 simonb

Call nanotime() directly, instead of doing the
microtime()/TIMEVAL_TO_TIMESPEC() dance.


Revision tags: yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.84 30-Aug-2005 jmmv

Honor the user's umask while creating local sockets. Several other systems
do already this (such as FreeBSD, OpenBSD and Linux), so it will improve
portability of some third-party programs. No objections in tech-kern@.


# 1.83 16-Jun-2005 yamt

branches: 1.83.2;
uipc_usrreq: plug mbuf leak.


# 1.82 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


# 1.81 07-May-2005 christos

PR/30154: YAMAMOTO Takashi: tcp_close locking botch
chgsbsize() as mentioned in the PR can be called from an interrupt context
via tcp_close(). Avoid calling uid_find() in chgsbsize().
- Instead of storing so_uid in struct socketvar, store *so_uidinfo
- Add a simple lock to struct uidinfo.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.80 26-Feb-2005 perry

branches: 1.80.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.79 03-Sep-2004 darrenr

branches: 1.79.4; 1.79.6;
add a per-socket counter for dropped UDP packets when the internal buffers
are full.


# 1.78 22-May-2004 jonathan

Eliminate several uses of `curproc' from the socket-layer code and from NFS.

Add a new explicit `struct proc *p' argument to socreate(), sosend().
Use that argument instead of curproc. Follow-on changes to pass that
argument to socreate(), sosend(), and (*so->so_send)() calls.
These changes reviewed and independently recoded by Matt Thomas.

Changes to soreceive() and (*dom->dom_exernalize() from Matt Thomas:
pass soreceive()'s struct uio* uio->uio_procp to unp_externalize().
Eliminate curproc from unp_externalize. Also, now soreceive() uses
its uio->uio_procp value, pass that same value downward to
((pr->pru_usrreq)() calls for consistency, instead of (struct proc * )0.

Similar changes in sys/nfs to eliminate (most) uses of curproc,
either via the req-> r_procp field of a struct nfsreq *req argument,
or by passing down new explicit struct proc * arguments.

Reviewed by: Matt Thomas, posted to tech-kern.
NB: The (*pr->pru_usrreq)() change should be tested on more (all!) protocols.


# 1.77 18-Apr-2004 matt

Constify sun_noname.


# 1.76 18-Apr-2004 matt

ANSI'fy.


# 1.75 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.74 23-Mar-2004 junyoung

branches: 1.74.2; 1.74.4;
Nuke __P().


# 1.73 29-Dec-2003 martin

Avoid using m_clget() on a mbuf already in use, especially when we
need the data in the mbuf later and m_clget() changes some fields
overlaid to regular mbuf data. Instead, rearange code a bit, create
data into a new allocated buffer and and use MEXTADD to attach it to
the mbuf, if the mbuf internal space is not sufficient.

This fixes a crash on sparc64 (and probably all other archs where
sizeof(int) != sizeof(struct file *)) when running
regress/sys/kern/unfdpass.

Idea for solution from Matt Thomas, with additional input from YAMAMOTO
Takashi.


# 1.72 29-Nov-2003 matt

Restore a change that made AF_LOCAL sockets block on connect(2) until
accepted. However, this time this behavor is not the default. Instead
it must enabled by using the LOCAL_CONNWAIT socket option on either the
connecting or accepting socket.


# 1.71 29-Nov-2003 perry

Revert a change that altered the semantics of AF_LOCAL sockets. Sadly
this made us API incompatible with other Unixes.


# 1.70 15-Oct-2003 hannken

Add the gating of system calls that cause modifications to the underlying
file system.
The function vfs_write_suspend stops all new write operations to a file
system, allows any file system modifying system calls already in progress
to complete, then sync's the file system to disk and returns. The
function vfs_write_resume allows the suspended write operations to
complete.

From FreeBSD with slight modifications.

Approved by: Frank van der Linden <fvdl@netbsd.org>


# 1.69 03-Sep-2003 matt

Fix typo.


# 1.68 03-Sep-2003 matt

Change the behavor of AF_LOCAL connect() to sleep until the server has
accepted the connection. This can prevent a client from overwhelming a
server.


# 1.67 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.66 24-Jul-2003 jdolecek

back rev 1.63 (the linux hack) off - no compat specific code
in generic code, please

we need to massage the passed linux cmsg anyway, linux uses different
alignment for CMSG_DATA on at least some architectures


# 1.65 23-Jul-2003 itojun

backout previous, there was a comment on LINUX_SOL_SOCKET=1


# 1.64 23-Jul-2003 itojun

#define LINUX_SOL_SOCKET 1, so that we can answer "what the hell is this 1?"
at ease.


# 1.63 23-Jul-2003 christos

From Todd Vierling: Accept level == 1 for linux compat.


# 1.62 29-Jun-2003 fvdl

branches: 1.62.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.61 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.60 10-Apr-2003 christos

RP/21088: Jesse Off: Return ENOBUFS instead of EINVAL when sbappend fails.


# 1.59 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


# 1.58 25-Feb-2003 pk

Fix a simple_lock() mismatch in unp_internalize().
We may need to merge the passes over the files contained in the message
as noted by enami tsugutomo on tech-smp.


# 1.57 23-Feb-2003 pk

Make updating a file's reference and use count MP-safe.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.56 25-Nov-2002 itojun

branches: 1.56.2;
no need for error check after MEXTMALLOC - jdolecek


# 1.55 25-Nov-2002 itojun

MEXTMALLOC() can fail even if M_WAITOK, if arg is too big for malloc().


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.54 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base gehenna-devsw-base eeh-devprop-base newlock-base ifpoll-base thorpej-mips-cache-base
# 1.53 12-Nov-2001 lukem

add RCSIDs


# 1.52 18-Oct-2001 thorpej

branches: 1.52.2;
Deprecate the "m_act" alias of "m_nextpkt" (m_act is a historical
name), and just use m_nextpkt everywhere.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.51 14-Jun-2001 thorpej

branches: 1.51.2;
Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.


# 1.50 07-Jun-2001 thorpej

Rework fdalloc() even further: split fdalloc() into fdalloc() and
fdexpand(). The former will return ENOSPC if there is not space
in the current filedesc table. The latter performs the expansion
of the filedesc table. This means that fdalloc() won't ever block,
and it gives callers an opportunity to clean up before the
potentially-blocking fdexpand() call.

Update all fdalloc() callers to deal with the need-to-fdexpand() case.

Rewrite unp_externalize() to use fdalloc() and fdexpand() in a
safe way, using an algorithm suggested by Bill Sommerfeld:
- Use a temporary array of integers to hold the new filedesc table
indexes. This allows us to repeat the loop if necessary.
- Loop through the array of file *'s, assigning them to filedesc table
slots. If fdalloc() indicates expansion is necessary, undo the
assignments we've done so far, expand, and retry the whole process.
- Once all file *'s have been assigned to slots, update the f_msgcount
and unp_rights counters.
- Right before we return, copy the temporary integer array to the message
buffer, and trim the length as before.
Note that once locking is added to the filedesc array, this entire
operation will be `atomic', in that the lock will be held while
file *'s are assigned to embryonic table slots, thus preventing anything
else from using them.


# 1.49 06-Jun-2001 thorpej

Change fdalloc() to return ERESTART if we had to reallocate the
descriptor array, which may have blocked. Change callers of
fdalloc() to restart whatever they\'re doing if this condition
happens. (XXX unp_externalize() needs some work, but that will
be tackled later.)

Change finishdup() to close the descriptor in the `new\' slot if
one exists, and change sys_dup2() accordingly.

Closes a race condition when using kernel-assisted user threads.

While here, garbage-collect UF_MAPPED -- it is not used anywhere.


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base thorpej_scsipi_base
# 1.48 05-Jun-2000 thorpej

branches: 1.48.2; 1.48.4;
Oops, missed a couple of places where CMSG_*() should be used. No
functional change in this case, but the code is now correct.


# 1.47 05-Jun-2000 thorpej

- Fix file descriptor passing AGAIN. This has apparently been broken
on LP64 systems (and probably the SPARC) since the __cmsg_alignbytes()
changes went in.
- Change file descriptor passing to use CMSG_DATA(), not (cm + 1). This
pretty much has to be done in order to make it work properly on LP64,
and considering that it's been broken this long...
- Use CMSG_SPACE() to determine the mbuf length needed for a given
control message, and CMSG_LEN() to stash in the cmsg_len member.


Revision tags: minoura-xpg4dl-base
# 1.46 30-Mar-2000 augustss

branches: 1.46.2;
Get rid of register declarations.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base chs-ubc2-base
# 1.45 17-Jun-1999 thorpej

branches: 1.45.2;
Um, hi, let's initialize pointers before we use them.


# 1.44 05-May-1999 thorpej

Add "use counting" to file entries. When closing a file, and it's reference
count is 0, wait for use count to drain before finishing the close.

This is necessary in order for multiple processes to safely share file
descriptor tables.


# 1.43 05-May-1999 thorpej

Fix alignment problem in the garbage-collection code path.


# 1.42 30-Apr-1999 thorpej

Break cdir/rdir/cmask info out of struct filedesc, and put it in a new
substructure, `cwdinfo'. Implement optional sharing of this substructure.

This is required for clone(2).


# 1.41 21-Apr-1999 mrg

revert previous. oops.


# 1.40 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: netbsd-1-4-base
# 1.39 22-Mar-1999 sommerfe

branches: 1.39.2;
Disallow descriptor-passing of descriptors which are open on
directories which aren't under the recipient's root.

Clean up of many error conditions involving descriptor passing, to
eliminate infinite loops, panics, premature garbage collection of
sockets, and descriptor leaks:
- Avoid letting unp_gc() see descriptors with a refcount of zero by
removing them from the socket's queue before releasing them.
- Avoid socket leak in PRU_ABORT (this will also gc descriptors queued
on a not-yet accepted socket when the accepting socket goes away).
- Put in block comment explaining how unp_gc() should work.
- Correctly manage unp_defer count so we don't get stuck in an infinite
loop with nothing to do.
- Don't tie MARK and DEFER bits so closely together.
- Mark descriptors queued on not-yet-accepted sockets as well.
- Don't call sorflush on non-socket, it doesn't work very well.
- Deal with discard of NULL file pointer.
- Hopefully cause GC to converge faster by only deferring sockets in
unp_mark().


# 1.38 21-Dec-1998 thorpej

In unp_internalize(), add a comment explaining why we must ALIGN() the
data after the cmsghdr when accessing internalized SCM_RIGHTS messages
(i.e. array of struct file *s). The historic interface does not align
the externalized SCM_RIGHTS messages (i.e. array of ints).


# 1.37 21-Dec-1998 thorpej

Fix a fencepost error in unp_scan() which caused a bad pointer deref on
the SPARC platform only (ILP32 but ALIGNBYTES of 7), due to a missing
ALIGN().


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.36 04-Aug-1998 perry

Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)


# 1.35 31-Jul-1998 perry

fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.


Revision tags: eeh-paddr_t-base
# 1.34 18-Jul-1998 lukem

branches: 1.34.2;
use AF_LOCAL instead of AF_UNIX


# 1.33 16-Jul-1998 thorpej

Back out previous, I botched something.


# 1.32 10-Jul-1998 thorpej

For SOCK_STREAM, provide the socket credentials to the accepter as soon as
the client connects.


# 1.31 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.30 07-Jan-1998 thorpej

Implement passing credentials as ancillary data on Unix domain sockets,
enabled with the LOCAL_CREDS socket option on the listener. Semantics are
similar to BSD/OS's:
- Creds are available with first data on SOCK_STREAM, and with every datagram
on SOCK_DGRAM.
- It is not possible to forge credentials.

Different in that:
- Different credential data structure (ours does not rely on the format
of internal kernel data structures, and does not pass the login name).
- We can pass creds and file descriptors at the same time (this does not
work in BSD/OS).

Luke Mewburn <lukem@netbsd.org> gets credit for inspiring me to implement
this. :-)


# 1.29 07-Jan-1998 thorpej

Fix passing of multiple file descriptors (was broken when code was made
64-bit safe).


Revision tags: netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.28 17-Oct-1997 christos

branches: 1.28.2;
PR/4280: Chris Jones: Sending more than one fd over AF_UNIX sockets causes
panic. Bug in the fd -> struct file * conversion...


Revision tags: thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.27 26-Jun-1997 thorpej

Several small changes to eliminate kludginess in dealing with unix domain
socket names:
- In unp_setsockaddr() and unp_setpeeraddr(), if the socket name can't
fit into a single mbuf, allocate enough external storage space to
hold it.
- In unp_bind() and unp_connect(), perform a similar operation, but allocate
one extra byte, and ensure that the pathname is nul-terminated.
Many thanks to enami tsugutomo <enami@cv.sony.co.jp> for the sanity
checking.


# 1.26 24-Jun-1997 thorpej

Eliminate use of dtom() in the handing of UNIX domain sockets. Add an
"unp_addrlen" member to the unpcb, and use it when copying the socket
name. This eliminates that last uses of dtom() in the system.


# 1.25 15-May-1997 kleink

When fstat(2)ing a file descriptor of a local communications domain socket,
fill the socket's creation time into the stat structure's st_[acm]time fields:
POSIX requires this behavior for pipe(2). N.B.: updating the st_[am]time fields
when reading/writing the pipe is neither required nor implemented, though.


# 1.24 10-Apr-1997 cgd

Internalize and externalize file descriptors being passed via local domain
socket control messages correctly, without assuming that sizeof(int) ==
sizeof(pointer). Fixes PR#3183.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.23 23-May-1996 mycroft

Oops. Add missing label.


# 1.22 23-May-1996 mycroft

We can only get a control mbuf for PRU_SEND or PRU_SENDOOB. Add diagnostic
code to panic in this case.


# 1.21 23-May-1996 mycroft

Make sure the control and data mbufs are freed in all cases.


# 1.20 23-May-1996 mycroft

Separate some code into separate functions.
Make unp_addr be a pointer to the sockaddr, not to the mbuf, as with raw
sockets.
Other minor cleanup.


# 1.19 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.18 09-Feb-1996 christos

branches: 1.18.4;
More proto fixes


# 1.17 04-Feb-1996 pk

unp_detach() return type botch.


# 1.16 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.15 17-Aug-1995 mycroft

so_pcb should be a void *.


# 1.14 16-Aug-1995 mycroft

Allocate PCBs with malloc(), more mgetclr(). Be more careful to free the
PCB after it's done with.


# 1.13 05-Apr-1995 mycroft

Add missing argument to closef().


# 1.12 13-Dec-1994 mycroft

LEASE_CHECK -> VOP_LEASE


# 1.11 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.10 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.9 08-Jun-1994 mycroft

Update to 4.4-Lite fs code.


# 1.8 04-May-1994 mycroft

Fix panic when closing a file descriptor on which access rights have been sent
but not received.


# 1.7 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.6 14-Sep-1993 mycroft

Fix from Noriyuki Soda <soda@sra.co.jp>:
recvmsg(2) always returns -1 with errno==EMSGSIZE, when trying
to pass file descriptors through UNIX domain socket.


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.5 27-Jun-1993 andrew

branches: 1.5.4;
ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.4 12-Jun-1993 andrew

Yuval Yarom's 'panic:closef: count < 0' fix to unp_discard().


# 1.3 22-May-1993 cgd

add include of select.h if necessary for protos, or delete if extraneous


# 1.2 18-May-1993 cgd

make kernel select interface be one-stop shopping & clean it all up.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.200 06-Nov-2020 christos

PR/55777: Ruslan Nikolaev: Move the unp_sysctl_create to uipc_usrreq.c to
facilitate splitting rump modules and does not require a dummy function.


Revision tags: thorpej-futex-base
# 1.199 26-Aug-2020 christos

add socket info for user and group for unix sockets in fstat.


Revision tags: bouyer-xenpvh-base2
# 1.198 21-Apr-2020 ad

Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.


Revision tags: phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.197 23-Feb-2020 ad

branches: 1.197.4;
Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


# 1.196 01-Feb-2020 riastradh

Load struct fdfile::ff_file with atomic_load_consume.

Exceptions: when we're only testing whether it's there, not about to
dereference it.

Note: We do not use atomic_store_release to set it because the
preceding mutex_exit should be enough.

(That said, it's not clear the mutex_enter/exit is needed unless
refcnt > 0 already, in which case maybe it would be a win to switch
from the membar implied by mutex_enter to the membar implied by
atomic_store_release -- which I would generally expect to be much
cheaper. And a little clearer without a long comment.)


# 1.195 01-Feb-2020 riastradh

Load struct filedesc::fd_dt with atomic_load_consume.

Exceptions: when fd_refcnt <= 1, or when holding fd_lock.

While here:

- Restore KASSERT(mutex_owned(&fdp->fd_lock)) in fd_unused.
=> This is used only in fd_close and fd_abort, where it holds.
- Move bounds check assertion in fd_putfile to where it matters.
- Store fd_dt with atomic_store_release.
- Move load of fd_dt under lock in knote_fdclose.
- Omit membar_consumer in fdesc_readdir.
=> atomic_load_consume serves the same purpose now.
=> Was needed only on alpha anyway.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 ad-namecache-base2 ad-namecache-base1 ad-namecache-base netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base
# 1.194 29-Jul-2019 maxv

branches: 1.194.2; 1.194.4;
Fix info leak: the padding after the header causes uninitialized heap
memory to be copied to userland in sys_recvmsg().


Revision tags: phil-wifi-20190609
# 1.193 03-Jun-2019 msaitoh

Fix typo in comment (s/seperate/separate/).


Revision tags: isaki-audio2-base
# 1.192 01-Mar-2019 pgoyette

Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.

NFCI intended.

Ride the earlier kernel bump - it;s getting crowded.


# 1.191 20-Feb-2019 pgoyette

compat70_ocreds_valid is not a pointer to a boolean, it is the boolean
itself which controls whether or not we recognize the OCRED options.

Should fix the panic identified in PR kern/53991 (awaiting confirmation
from submitter).


# 1.190 04-Feb-2019 mrg

add or adjust fallthru comments.


# 1.189 29-Jan-2019 pgoyette

Normalize all the compat hooks' names to the form

<subsystem>_<function>_<version>_hook

NFCI

XXX Note that although this introduces a change in the kernel-to-
XXX module interface, we are NOT bumping the kernel version number.
XXX We will bump the version number once the interface stabilizes.


# 1.188 27-Jan-2019 pgoyette

Merge the [pgoyette-compat] branch


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.187 08-Nov-2018 roy

Don't call soroverflow when we return the error to the sender.

Thanks to thorpej@ for a sanity check.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.186 11-May-2018 roy

branches: 1.186.2;
Increase the default size of some receive buffers from 8k to 16k.
This mitigates recent reports of socket overflow errors
and fixes PR bin/53247.


# 1.185 05-May-2018 christos

bump PIPSIZ from 4 to 8K like FreeBSD and provide the same sysctls


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.184 19-Mar-2018 roy

socket: report receive buffer overflows

Add soroverflow() which increments the overflow counter, sets so_error
to ENOBUFS and wakes the receive socket up.
Replace all code that manually increments this counter with soroverflow().
Add soroverflow() to raw_input().

This allows userland to detect route(4) overflows so it can re-sync
with the current state.


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.183 17-Feb-2018 christos

branches: 1.183.2;
fix LOCAL_PEEREID to not return the same info for both sides...
XXX: pullup-{7,8}


Revision tags: tls-maxphys-base-20171202
# 1.182 02-Dec-2017 mrg

include opt_compat_netbsd.h, so that eg COMPAT_70 will be set.


Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.181 31-Oct-2016 maxv

branches: 1.181.8;
Memory leak, found by Mootja. It is easily triggerable from userland.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.180 06-Apr-2016 roy

branches: 1.180.2;
Add sc_pid to sockcred so that SOCK_DGRAM and LOCAL_CREDS socket option
can learn the process id of the AF_LOCAL sender.
Add compat glue for old structure.


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606
# 1.179 02-May-2015 rtr

make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}


# 1.178 26-Apr-2015 rtr

remove pr_generic from struct pr_usrreqs and all implementations of
pr_generic in protocols.

bump to 7.99.13

approved by rmind@


# 1.177 24-Apr-2015 rtr

make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19


Revision tags: nick-nhusb-base-20150406
# 1.176 03-Apr-2015 rtr

* change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@


# 1.175 01-Mar-2015 christos

PR/39918: ITOH Yasufumi: Replace KASSERT with continue, since the file
descriptor can be closed since closef() does not pay attention to FDEFER.
XXX: Pullup-7


# 1.174 28-Feb-2015 rtr

Place opening brace to function at column 0, like in the rest of the file.


# 1.173 02-Feb-2015 christos

Handle LOCAL_PEERID for socketpair() connected sockets which connect through
connect2().
1. move the code that sets the peerid structure into connect1(). This
handles so2. The datagram code calls connect2 twice with flipped
so arguments so both sockets get set.
2. in connect2 copy the peerid structure from so2 to so, so that that
both stream sockets get set.


Revision tags: nick-nhusb-base
# 1.172 08-Oct-2014 taca

branches: 1.172.2;
Make behavior of getsockname(2) (and maybe getpeername(2)) as the same as
NetBSD 6.1_STABLE and other operating system (OS X 10.9.5).

* sa_len of sockaddr_un strucrure is always set to sizeof(sun_path).
* pathname stored in sun_path is alwasys '\0' terminated (except length
of sun_path is sizeof(sun_path)?).

Should be fix PR kern/49247, runtime problem of lmtp service of dovecot2 on
NetBSD current and NetBSD 7.0_BETA.


# 1.171 05-Sep-2014 matt

Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.


# 1.170 05-Sep-2014 matt

Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 09-Aug-2014 rtr

branches: 1.169.2;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@


# 1.168 08-Aug-2014 rtr

split PRU_RCVD function out of pr_generic() usrreq switches and put into
separate functions

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_RCVD with calls to
pr_rcvd()


# 1.167 05-Aug-2014 rtr

actually use the passed in struct lwp *l instead of curlwp in unp_connect()


# 1.166 05-Aug-2014 rtr

split PRU_SEND function out of pr_generic() usrreq switches and put into
separate functions

xxx_send(struct socket *, struct mbuf *, struct mbuf *,
struct mbuf *, struct lwp *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_SEND with calls to
pr_send()

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_send() PRUs

- l2cap_send() -> l2cap_send_pcb()
- sco_send() -> sco_send_pcb()
- rfcomm_send() -> rfcomm_send_pcb()

patch reviewed by rmind


# 1.165 05-Aug-2014 rtr

revert the removal of struct lwp * parameter from bind, listen and connect
user requests.

this should resolve the issue relating to nfs client hangs presented
recently by wiz on current-users@


# 1.164 31-Jul-2014 rtr

* remove declarations of unp_bind, unp_discard, unp_disconnect1, unp_drop,
unp_shutdown1, unp_internalize and unp_output functions from sys/un.h
and instead declare them as static in uipc_usrreq.c with prototype
declarations as necessary.

* remove struct lwp * parameter from unp_output() while here and just
use curlwp instead.

as discussed with rmind


# 1.163 31-Jul-2014 rtr

split PRU_DISCONNECT, PRU_SHUTDOWN and PRU_ABORT function out of
pr_generic() usrreq switches and put into separate functions

xxx_disconnect(struct socket *)
xxx_shutdown(struct socket *)
xxx_abort(struct socket *)

- always KASSERT(solocked(so)) even if not implemented
- replace calls to pr_generic() with req =
PRU_{DISCONNECT,SHUTDOWN,ABORT}
with calls to pr_{disconnect,shutdown,abort}() respectively

rename existing internal functions used to implement above functionality
to permit use of the names for xxx_{disconnect,shutdown,abort}().

- {l2cap,sco,rfcomm}_disconnect() ->
{l2cap,sco,rfcomm}_disconnect_pcb()
- {unp,rip,tcp}_disconnect() -> {unp,rip,tcp}_disconnect1()
- unp_shutdown() -> unp_shutdown1()

patch reviewed by rmind


# 1.162 30-Jul-2014 rtr

split PRU_CONNECT function out of pr_generic() usrreq switches and put
into seaparate functions

xxx_listen(struct socket *, struct mbuf *)

- always KASSERT(solocked(so)) and KASSERT(nam != NULL)
- replace calls to pr_generic() with req = PRU_CONNECT with
pr_connect()
- rename existin {l2cap,sco,rfcomm}_connect() to
{l2cap,sco,rfcomm}_connect_pcb() respectively to permit
naming consistency with other protocols functions.
- drop struct lwp * parameter from unp_connect() and at_pcbconnect()
and use curlwp instead where appropriate.

patch reviewed by rmind


# 1.161 24-Jul-2014 rtr

split PRU_BIND and PRU_LISTEN function out of pr_generic() usrreq
switches and put into separate functions
xxx_bind(struct socket *, struct mbuf *)
xxx_listen(struct socket *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_BIND with call to
pr_bind()

- replace calls to pr_generic() with req = PRU_LISTEN with call to
pr_listen()

- drop struct lwp * parameter from at_pcbsetaddr(), in_pcbbind() and
unp_bind() and always use curlwp.

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_{bind,listen}() PRUs

- l2cap_{bind,listen}() -> l2cap_{bind,listen}_pcb()
- sco_{bind,listen}() -> sco_{bind,listen}_pcb()
- rfcomm_{bind,listen}() -> rfcomm_{bind,listen}_pcb()

patch reviewed by rmind

welcome to netbsd 6.99.48


# 1.160 23-Jul-2014 rtr

split PRU_SENDOOB and PRU_RCVOOB function out of pr_generic() usrreq
switches and put into separate functions
xxx_sendoob(struct socket *, struct mbuf *, struct mbuf *)
xxx_recvoob(struct socket *, struct mbuf *, int)

- always KASSERT(solocked(so)) even if request is not implemented

- replace calls to pr_generic() with req = PRU_{SEND,RCV}OOB with
calls to pr_{send,recv}oob() respectively.

there is still some tweaking of m_freem(m) and m_freem(control) to come
for consistency. not performed with this commit for clarity.

reviewed by rmind


# 1.159 09-Jul-2014 rtr

* split PRU_ACCEPT function out of pr_generic() usrreq switches and put
into a separate function xxx_accept(struct socket *, struct mbuf *)

note: future cleanup will take place to remove struct mbuf parameter
type and replace it with a more appropriate type.

patch reviewed by rmind


# 1.158 09-Jul-2014 rtr

* split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47


# 1.157 07-Jul-2014 rtr

* sprinkle KASSERT(solocked(so)); in all pr_stat() functions.
* fix remaining inconsistent struct socket parameter names.


# 1.156 06-Jul-2014 rtr

* split PRU_SENSE functionality out of xxx_usrreq() switches and place into
separate xxx_stat(struct socket *, struct stat *) functions.
* replace calls using pr_generic with req == PRU_SENSE with pr_stat().

further change will follow that cleans up the pattern used to extract the
pcb and test for its presence.

reviewed by rmind


# 1.155 01-Jul-2014 rtr

fix parameter types in pr_ioctl, called xx_control() functions and remove
abuse of pointer to struct mbuf type.

param2 changed to u_long type and uses parameter name 'cmd' (ioctl command)
param3 changed to void * type and uses parameter name 'data'
param4 changed to struct ifnet * and uses parameter name 'ifp'
param5 has been removed (formerly struct lwp *) and uses of 'l' have been
replaced with curlwp from curproc(9).

callers have had (now unnecessary) casts to struct mbuf * removed, called
code has had (now unnecessary) casts to u_long, void * and struct ifnet *
respectively removed.

reviewed by rmind@


# 1.154 22-Jun-2014 rtr

* split PRU_CONTROL functionality out of xxx_userreq() switches and place
into separate xxx_ioctl() functions.
* place KASSERT(req != PRU_CONTROL) inside xxx_userreq() as it is now
inappropriate for req = PRU_CONTROL in xxx_userreq().
* replace calls to pr_generic() with req = PRU_CONTROL with pr_ioctl().
* remove & fixup references to PRU_CONTROL xxx_userreq() function comments.
* fix various comments references for xxx_userreq() that mentioned
PRU_CONTROL as xxx_userreq() no longer handles the request.

a further change will follow to fix parameter and naming inconsistencies
retained from original code.

Reviewed by rmind@


# 1.153 08-Jun-2014 christos

Handle race where the server closed the socket between us 'connecting' and
sending data.


# 1.152 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.151 18-May-2014 rmind

Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.150 23-Jan-2014 hannken

branches: 1.150.2;
Change vnode operations create, mknod, mkdir and symlink to return
the resulting vnode *vpp unlocked.

Discussed on tech-kern@

Welcome to 6.99.30


# 1.149 17-Jan-2014 hannken

Change vnode operations create, mknod, mkdir and symlink to keep the
directory node dvp locked on return.

Discussed on tech-kern@

Welcome to 6.99.29


# 1.148 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.147 25-Oct-2013 martin

Eliminat an unused variable and simplify the KASSERT which used to use it


# 1.146 08-Oct-2013 christos

Centralize the sockaddr_un allocation code. Set sun_len appropriately so
that the address length returned is correct, not always 106. Note that
we do things slightly differently than linux and explain why. Unit-tests
to come.


# 1.145 08-Oct-2013 christos

- Instead of having accept(2) return a zero-filled sockaddr for the case
where accept(2) was called on a unix socket that called connect(2) and
then close(2), before the connection was accepted, return the empty
sockaddr_un.
- Fix the length of the empty sockaddr_un socket so that it reflects reality.


# 1.144 29-Aug-2013 rmind

Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.


# 1.143 01-Aug-2013 drochner

In unp_externalize, don't do anything if an SCM_RIGHTS control message
was sent with zero file descriptors in it. Otherwise, a zero-length
temporary storage would be allocated which triggers panic on DIAGNOSTIC
kernels (but is harmless for release kernels).
reviewed by Taylor R Campbell


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.142 27-Jun-2013 christos

branches: 1.142.2;
use sbcreatecontrol1() and m_add() instead of open-coding everything, and
getting it slightly wrong.


Revision tags: agc-symver-base
# 1.141 14-Feb-2013 riastradh

Fix some screw cases in cmsg file descriptor passing.

- Don't leave garbage in the control buffer if allocating file
descriptors fails in unp_externalize.

- Scrub the space between CMSG_LEN and CMSG_SPACE to avoid kernel
memory disclosure in unp_externalize.

- Don't read past cmsg_len when closing file descriptors that
couldn't get delivered, in free_rights.

ok christos


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.140 06-Oct-2012 christos

Avoid crash dereferencing a NULL fp in fd_affix() in unp_externalize
caused by the sequence of passing two fd's with two sendmsg()'s,
then doing a read() and a recvmsg(). The read() calls dom_dispose()
which discards both messages in the mbuf, and sets the fp's in the
array to NULL. Linux dequeues only one message per read() so the
second recvmsg() gets the fd from the second message. This fix
just avoids the NULL pointer de-reference, making the second
recvmsg() to fail. It is dubious to pass fd's with stream sockets
and expect mixing read() and recvmsg() to work. Plus processing
one control message per read() changes the current semantics and
should be examined before applied. In addition there is a race between
dom_externalize() and dom_dispose(): what happens in a multi-threaded
network stack when one thread disposes where the other externalizes
the same array?

NB: Pullup to 6.


# 1.139 30-Jul-2012 christos

branches: 1.139.2;
remove infinite loop on error, extra parens on return.


# 1.138 30-Jul-2012 christos

simplify unp_externalize(), some from gimpy, some from me.


# 1.137 02-Jun-2012 martin

Stopgap fix for PR kern/46463: disallow passing of kqueue descriptors
via SCM_RIGHT anxiliary socket messages.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.136 26-Jun-2011 christos

branches: 1.136.2; 1.136.8;
* Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).

- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.

* Fix the system calls that take socklen_t arguments to actually do so.

* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).

* Bump libc version for the new syscalls.


# 1.135 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.134 29-May-2011 manu

branches: 1.134.2;
Add SOCK_SEQPACKET to PL_LOCAL sockets. Based on patch from Jesse Off,
submitted 8 years ago:
http://mail-index.netbsd.org/tech-kern/2003/04/14/0006.html


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.133 19-Nov-2010 dholland

branches: 1.133.2;
Introduce struct pathbuf. This is an abstraction to hold a pathname
and the metadata required to interpret it. Callers of namei must now
create a pathbuf and pass it to NDINIT (instead of a string and a
uio_seg), then destroy the pathbuf after the namei session is
complete.

Update all namei call sites accordingly. Add a pathbuf(9) man page and
update namei(9).

The pathbuf interface also now appears in a couple of related
additional places that were passing string/uio_seg pairs that were
later fed into NDINIT. Update other call sites accordingly.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3
# 1.132 21-Oct-2010 yamt

unp_connect: fix an assertion


# 1.131 21-Oct-2010 yamt

unp_connect2: fix a comment.


Revision tags: yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.130 24-Jun-2010 hannken

Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.129 09-Feb-2010 wiz

branches: 1.129.2;
Fix typo in comment.


Revision tags: uebayasi-xip-base
# 1.128 08-Jan-2010 pooka

branches: 1.128.2;
The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change


Revision tags: matt-premerge-20091211 yamt-nfs-mp-base8 jym-xensuspend-nbase
# 1.127 26-Aug-2009 bouyer

In uipc_usrreq(PRU_ACCEPT), grab the unp_streamlock before unp_setpeerlocks().
This fixes a race where, for a short period of time, so->so_lock and
so2->so_lock are not sync. This makes solocked2() and solocked()
unreliable and cause DIAGNOSTIC kernel panics. This also fixes a possible
panic in unp_setaddr() which expects the socket locked.
Should fix kern/38968, fix proposed in
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005863.html


Revision tags: yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.126 24-May-2009 ad

More changes to improve kern_descrip.c.

- Avoid atomics in more places.
- Remove the per-descriptor mutex, and just use filedesc_t::fd_lock.
It was only being used to synchronize close, and in any case we needed
to take fd_lock to free the descriptor slot.
- Optimize certain paths for the <NDFDFILE case.
- Sprinkle more comments and assertions.
- Cache more stuff in filedesc_t.
- Fix numerous minor bugs spotted along the way.
- Restructure how the open files array is maintained, for clarity and so
that we can eliminate the membar_consumer() call in fd_getfile(). This is
mostly syntactic sugar; the main functional change is that fd_nfiles now
lives alongside the open file array.

Some measurements with libmicro:

- simple file syscalls are like close() are between 1 to 10% faster.
- some nice improvements, e.g. poll(1000) which is ~50% faster.


Revision tags: yamt-nfs-mp-base4 yamt-nfs-mp-base3 jym-xensuspend-base
# 1.125 04-May-2009 yamt

tweak some assertions on so_head to make them more meaningful.


Revision tags: nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.124 09-Apr-2009 yamt

0 -> NULL


# 1.123 09-Apr-2009 yamt

remove an unnecessary cast.


# 1.122 09-Apr-2009 yamt

0 -> NULL where appropriate


# 1.121 11-Mar-2009 mrg

completely rework the way that orphaned sockets that are being fdpassed
via SCM_RIGHTS messages are dealt with:

1. unp_gc: make this a kthread.

2. unp_detach: go not call unp_gc directly. instead, wake up unp_gc kthread.

3. unp_scan: do not close files here. instead, put them on a global list
for unp_gc to close, along with a per-file "deferred close count". if
file is already enqueued for close, just increment deferred close count.
this eliminates the recursive calls.

3. unp_gc: scan files on global deferred close list. close each file N
times, as specified by deferred close count in file. continue processing
list until it becomes empty (closing may cause additional files to be
queued for close).

4. unp_gc: add additional bit to mark files we are scanning. set during
initial scan of global file list that currently clears FMARK/FDEFER.
during later scans, never examine / garbage collect descriptors that
we have not marked during the earlier scan. do not proceed with this
initial scan until all deferred closes have been processed. be careful
with locking to ensure no races are introduced between deferred close
and file scan.

5. unp_gc: use dummy file_t to mark position in list when scanning. allow
us to drop filelist_lock. in turn allows us to eliminate kmem_alloc()
and safely close files, etc.

6. prohibit transfer of descriptors within SCM_RIGHTS messages if
(num_files_in_transit > maxfiles / unp_rights_ratio)

7. fd_allocfile: ensure recycled filse don't get scanned.


this is 97% work done by andrew doran, with a couple of minor bug fixes
and a lot of testing by yours truly.


Revision tags: nick-hppapmap-base2
# 1.120 08-Feb-2009 pooka

branches: 1.120.2;
Don't try to fd_putfile() descriptors we didn't manage to fd_getfile().

Fixes local DoS panic described in kern/40570.


Revision tags: netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.119 11-Oct-2008 pooka

branches: 1.119.2; 1.119.4;
Move uidinfo to its own module in kern_uidinfo.c and include in rump.
No functional change to uidinfo.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.118 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.117 20-Jun-2008 christos

branches: 1.117.2;
Also enforce that cm->cmsg_len >= CMSG_ALIGN(sizeof cmsghdr), from
Michael van Elst


# 1.116 20-Jun-2008 christos

Don't require cm->cmsg_len == control->m_len, just that the cm->cmsg_len
<= control->m_len, like FreeBSD does. Idea from Taylor R Campbell.


Revision tags: yamt-pf42-base4
# 1.115 10-Jun-2008 ad

There can be existing waiters on a socket's condition variables when we
change socket::so_lock, and they rely on the old lock to synchronize.
Wake them up whenever we change so_lock so they can restart their waits.


Revision tags: yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.114 28-Apr-2008 martin

branches: 1.114.2; 1.114.4;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.113 27-Apr-2008 ad

branches: 1.113.2;
Add a comment.


# 1.112 24-Apr-2008 ad

Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.111 20-Apr-2008 mlelstv

When unp_internalize fails (due to the sanity check or an out-of-memory
condition), it leaves the control message with file descriptors. Calling
unp_dispose() will interpret the message as containing file pointers
and crash the system.
This change removes unp_dispose() from this failure path and avoids
using goto to jump into switch statements...
The previous workaround to ignore such messages in unp_scan() is removed.


# 1.110 19-Apr-2008 mjf

If cm->cmsg_len is not valid for unp_internalize do not use it to work out
where the data is in unp_scan.

Fixes PR/38391


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.109 28-Mar-2008 ad

branches: 1.109.2;
Prevent overlapping calls to bind() and/or connect() on a Unix socket.


Revision tags: ad-socklock-base1
# 1.108 24-Mar-2008 yamt

merge yamt-lazymbuf branch.


Revision tags: yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.107 21-Mar-2008 rmind

unp_gc: unlock filelist_lock in a case of restart.


# 1.106 21-Mar-2008 ad

Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.105 25-Jan-2008 ad

branches: 1.105.6;
Remove VOP_LEASE. Discussed on tech-kern.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.104 05-Jan-2008 dsl

Use FILE_LOCK() and FILE_UNLOCK()


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base jmcneill-pm-base
# 1.103 08-Dec-2007 pooka

branches: 1.103.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.


Revision tags: vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase reinoud-bufcleanup-base
# 1.102 26-Nov-2007 pooka

branches: 1.102.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern


Revision tags: jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 vmlocking-base
# 1.101 08-Oct-2007 ad

branches: 1.101.4;
Merge file descriptor locking, cwdi locking and cross-call changes
from the vmlocking branch.


Revision tags: yamt-x86pmap-base2 yamt-x86pmap-base
# 1.100 19-Sep-2007 dyoung

branches: 1.100.2;
Make uipc_ctloutput() return ENOPROTOOPT instead of EINVAL when it
is passed a handle socket-option level that it does not care about.


Revision tags: nick-csl-alignment-base5
# 1.99 09-Aug-2007 he

branches: 1.99.2;
Add a new socket option for unix domain sockets: LOCAL_PEEREID, to make
it possible to get the pid, euid and egid of the process at the remote
end at the time it did bind() or connect().

Add a new libc function, getpeereid() to easily get at the euid and egid.
As a consequence, bump libc's minor number.

Document the LOCAL_PEEREID socket option in unix(4).

Based on contribution by Arne H. Juul, minor modifications by myself.


Revision tags: matt-mips64-base
# 1.98 03-Aug-2007 martin

branches: 1.98.2;
PR kern/32842:
do not leak file descriptors when sending a datagram with SCM_RIGHTS
fails. Patch from Gary Thorpe, based on changes in FreeBSD and work
from Christian Biere.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.97 22-Apr-2007 dsl

branches: 1.97.2; 1.97.6;
Change the way that emulations locate files within the emulation root to
avoid having to allocate space in the 'stackgap'
- which is very LWP unfriendly.
The additional code for non-emulation namei() is trivial, the reduction for
the emulations is massive.
The vnode for a processes emulation root is saved in the cwdi structure
during process exec.
If the emulation root the TRYEMULROOT flag are set, namei() will do an initial
search for absolute pathnames in the emulation root, if that fails it will
retry from the normal root.
".." at the emulation root will always go to the real root, even in the middle
of paths and when expanding symlinks.
Absolute symlinks found using absolute paths in the emulation root will be
relative to the emulation root (so /usr/lib/xxx.so -> /lib/xxx.so links
inside the emulation root don't need changing).
If the root of the emulation would be returned (for an emulation lookup), then
the real root is returned instead (matching the behaviour of emul_lookup,
but being a cheap comparison here) so that programs that scan "../.."
looking for the root dircetory don't loop forever.
The target for symbolic links is no longer mangled (it used to get the
CHECK_ALT_xxx() treatment, so could get /emul/xxx prepended).
CHECK_ALT_xxx() are no more. Most of the change is deleting them, and adding
TRYEMULROOT to the flags to NDINIT().
A lot of the emulation system call stubs could now be deleted.


Revision tags: thorpej-atomic-base
# 1.96 03-Apr-2007 hannken

Remove calls to now obsolete vn_start_write() and vn_finished_write().


# 1.95 04-Mar-2007 christos

branches: 1.95.2; 1.95.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.94 01-Nov-2006 cbiere

branches: 1.94.2; 1.94.4; 1.94.8;
Pointing one element past an array is fine, pointing before it isn't.


Revision tags: yamt-splraiseipl-base2 yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.93 03-Sep-2006 christos

branches: 1.93.2; 1.93.4;
use c99 initializers


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base8 yamt-pdpolicy-base7
# 1.92 23-Jul-2006 ad

Use the LWP cached credentials where sane.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.91 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.90 14-Apr-2006 christos

Coverity CID 1089: Add more KASSERTs to prevent NULL deref.


# 1.89 14-Apr-2006 christos

Coverity CID 1088: Add KASSERT to prevent NULL pointer deref.


# 1.88 13-Apr-2006 matt

Add a KASSERT to document a condition for the PRU_ABORT case.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.87 01-Mar-2006 christos

branches: 1.87.2; 1.87.4; 1.87.6;
PR/32856: Christian Biere: Don't panic if you send a control message with
SCM_RIGHTS on an unconnected stream socket.


# 1.86 11-Dec-2005 christos

branches: 1.86.2; 1.86.4; 1.86.6;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base ktrace-lwp-base
# 1.85 11-Nov-2005 simonb

Call nanotime() directly, instead of doing the
microtime()/TIMEVAL_TO_TIMESPEC() dance.


Revision tags: yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.84 30-Aug-2005 jmmv

Honor the user's umask while creating local sockets. Several other systems
do already this (such as FreeBSD, OpenBSD and Linux), so it will improve
portability of some third-party programs. No objections in tech-kern@.


# 1.83 16-Jun-2005 yamt

branches: 1.83.2;
uipc_usrreq: plug mbuf leak.


# 1.82 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


# 1.81 07-May-2005 christos

PR/30154: YAMAMOTO Takashi: tcp_close locking botch
chgsbsize() as mentioned in the PR can be called from an interrupt context
via tcp_close(). Avoid calling uid_find() in chgsbsize().
- Instead of storing so_uid in struct socketvar, store *so_uidinfo
- Add a simple lock to struct uidinfo.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.80 26-Feb-2005 perry

branches: 1.80.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.79 03-Sep-2004 darrenr

branches: 1.79.4; 1.79.6;
add a per-socket counter for dropped UDP packets when the internal buffers
are full.


# 1.78 22-May-2004 jonathan

Eliminate several uses of `curproc' from the socket-layer code and from NFS.

Add a new explicit `struct proc *p' argument to socreate(), sosend().
Use that argument instead of curproc. Follow-on changes to pass that
argument to socreate(), sosend(), and (*so->so_send)() calls.
These changes reviewed and independently recoded by Matt Thomas.

Changes to soreceive() and (*dom->dom_exernalize() from Matt Thomas:
pass soreceive()'s struct uio* uio->uio_procp to unp_externalize().
Eliminate curproc from unp_externalize. Also, now soreceive() uses
its uio->uio_procp value, pass that same value downward to
((pr->pru_usrreq)() calls for consistency, instead of (struct proc * )0.

Similar changes in sys/nfs to eliminate (most) uses of curproc,
either via the req-> r_procp field of a struct nfsreq *req argument,
or by passing down new explicit struct proc * arguments.

Reviewed by: Matt Thomas, posted to tech-kern.
NB: The (*pr->pru_usrreq)() change should be tested on more (all!) protocols.


# 1.77 18-Apr-2004 matt

Constify sun_noname.


# 1.76 18-Apr-2004 matt

ANSI'fy.


# 1.75 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.74 23-Mar-2004 junyoung

branches: 1.74.2; 1.74.4;
Nuke __P().


# 1.73 29-Dec-2003 martin

Avoid using m_clget() on a mbuf already in use, especially when we
need the data in the mbuf later and m_clget() changes some fields
overlaid to regular mbuf data. Instead, rearange code a bit, create
data into a new allocated buffer and and use MEXTADD to attach it to
the mbuf, if the mbuf internal space is not sufficient.

This fixes a crash on sparc64 (and probably all other archs where
sizeof(int) != sizeof(struct file *)) when running
regress/sys/kern/unfdpass.

Idea for solution from Matt Thomas, with additional input from YAMAMOTO
Takashi.


# 1.72 29-Nov-2003 matt

Restore a change that made AF_LOCAL sockets block on connect(2) until
accepted. However, this time this behavor is not the default. Instead
it must enabled by using the LOCAL_CONNWAIT socket option on either the
connecting or accepting socket.


# 1.71 29-Nov-2003 perry

Revert a change that altered the semantics of AF_LOCAL sockets. Sadly
this made us API incompatible with other Unixes.


# 1.70 15-Oct-2003 hannken

Add the gating of system calls that cause modifications to the underlying
file system.
The function vfs_write_suspend stops all new write operations to a file
system, allows any file system modifying system calls already in progress
to complete, then sync's the file system to disk and returns. The
function vfs_write_resume allows the suspended write operations to
complete.

From FreeBSD with slight modifications.

Approved by: Frank van der Linden <fvdl@netbsd.org>


# 1.69 03-Sep-2003 matt

Fix typo.


# 1.68 03-Sep-2003 matt

Change the behavor of AF_LOCAL connect() to sleep until the server has
accepted the connection. This can prevent a client from overwhelming a
server.


# 1.67 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.66 24-Jul-2003 jdolecek

back rev 1.63 (the linux hack) off - no compat specific code
in generic code, please

we need to massage the passed linux cmsg anyway, linux uses different
alignment for CMSG_DATA on at least some architectures


# 1.65 23-Jul-2003 itojun

backout previous, there was a comment on LINUX_SOL_SOCKET=1


# 1.64 23-Jul-2003 itojun

#define LINUX_SOL_SOCKET 1, so that we can answer "what the hell is this 1?"
at ease.


# 1.63 23-Jul-2003 christos

From Todd Vierling: Accept level == 1 for linux compat.


# 1.62 29-Jun-2003 fvdl

branches: 1.62.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.61 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.60 10-Apr-2003 christos

RP/21088: Jesse Off: Return ENOBUFS instead of EINVAL when sbappend fails.


# 1.59 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


# 1.58 25-Feb-2003 pk

Fix a simple_lock() mismatch in unp_internalize().
We may need to merge the passes over the files contained in the message
as noted by enami tsugutomo on tech-smp.


# 1.57 23-Feb-2003 pk

Make updating a file's reference and use count MP-safe.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.56 25-Nov-2002 itojun

branches: 1.56.2;
no need for error check after MEXTMALLOC - jdolecek


# 1.55 25-Nov-2002 itojun

MEXTMALLOC() can fail even if M_WAITOK, if arg is too big for malloc().


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.54 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base gehenna-devsw-base eeh-devprop-base newlock-base ifpoll-base thorpej-mips-cache-base
# 1.53 12-Nov-2001 lukem

add RCSIDs


# 1.52 18-Oct-2001 thorpej

branches: 1.52.2;
Deprecate the "m_act" alias of "m_nextpkt" (m_act is a historical
name), and just use m_nextpkt everywhere.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.51 14-Jun-2001 thorpej

branches: 1.51.2;
Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.


# 1.50 07-Jun-2001 thorpej

Rework fdalloc() even further: split fdalloc() into fdalloc() and
fdexpand(). The former will return ENOSPC if there is not space
in the current filedesc table. The latter performs the expansion
of the filedesc table. This means that fdalloc() won't ever block,
and it gives callers an opportunity to clean up before the
potentially-blocking fdexpand() call.

Update all fdalloc() callers to deal with the need-to-fdexpand() case.

Rewrite unp_externalize() to use fdalloc() and fdexpand() in a
safe way, using an algorithm suggested by Bill Sommerfeld:
- Use a temporary array of integers to hold the new filedesc table
indexes. This allows us to repeat the loop if necessary.
- Loop through the array of file *'s, assigning them to filedesc table
slots. If fdalloc() indicates expansion is necessary, undo the
assignments we've done so far, expand, and retry the whole process.
- Once all file *'s have been assigned to slots, update the f_msgcount
and unp_rights counters.
- Right before we return, copy the temporary integer array to the message
buffer, and trim the length as before.
Note that once locking is added to the filedesc array, this entire
operation will be `atomic', in that the lock will be held while
file *'s are assigned to embryonic table slots, thus preventing anything
else from using them.


# 1.49 06-Jun-2001 thorpej

Change fdalloc() to return ERESTART if we had to reallocate the
descriptor array, which may have blocked. Change callers of
fdalloc() to restart whatever they\'re doing if this condition
happens. (XXX unp_externalize() needs some work, but that will
be tackled later.)

Change finishdup() to close the descriptor in the `new\' slot if
one exists, and change sys_dup2() accordingly.

Closes a race condition when using kernel-assisted user threads.

While here, garbage-collect UF_MAPPED -- it is not used anywhere.


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base thorpej_scsipi_base
# 1.48 05-Jun-2000 thorpej

branches: 1.48.2; 1.48.4;
Oops, missed a couple of places where CMSG_*() should be used. No
functional change in this case, but the code is now correct.


# 1.47 05-Jun-2000 thorpej

- Fix file descriptor passing AGAIN. This has apparently been broken
on LP64 systems (and probably the SPARC) since the __cmsg_alignbytes()
changes went in.
- Change file descriptor passing to use CMSG_DATA(), not (cm + 1). This
pretty much has to be done in order to make it work properly on LP64,
and considering that it's been broken this long...
- Use CMSG_SPACE() to determine the mbuf length needed for a given
control message, and CMSG_LEN() to stash in the cmsg_len member.


Revision tags: minoura-xpg4dl-base
# 1.46 30-Mar-2000 augustss

branches: 1.46.2;
Get rid of register declarations.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base chs-ubc2-base
# 1.45 17-Jun-1999 thorpej

branches: 1.45.2;
Um, hi, let's initialize pointers before we use them.


# 1.44 05-May-1999 thorpej

Add "use counting" to file entries. When closing a file, and it's reference
count is 0, wait for use count to drain before finishing the close.

This is necessary in order for multiple processes to safely share file
descriptor tables.


# 1.43 05-May-1999 thorpej

Fix alignment problem in the garbage-collection code path.


# 1.42 30-Apr-1999 thorpej

Break cdir/rdir/cmask info out of struct filedesc, and put it in a new
substructure, `cwdinfo'. Implement optional sharing of this substructure.

This is required for clone(2).


# 1.41 21-Apr-1999 mrg

revert previous. oops.


# 1.40 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: netbsd-1-4-base
# 1.39 22-Mar-1999 sommerfe

branches: 1.39.2;
Disallow descriptor-passing of descriptors which are open on
directories which aren't under the recipient's root.

Clean up of many error conditions involving descriptor passing, to
eliminate infinite loops, panics, premature garbage collection of
sockets, and descriptor leaks:
- Avoid letting unp_gc() see descriptors with a refcount of zero by
removing them from the socket's queue before releasing them.
- Avoid socket leak in PRU_ABORT (this will also gc descriptors queued
on a not-yet accepted socket when the accepting socket goes away).
- Put in block comment explaining how unp_gc() should work.
- Correctly manage unp_defer count so we don't get stuck in an infinite
loop with nothing to do.
- Don't tie MARK and DEFER bits so closely together.
- Mark descriptors queued on not-yet-accepted sockets as well.
- Don't call sorflush on non-socket, it doesn't work very well.
- Deal with discard of NULL file pointer.
- Hopefully cause GC to converge faster by only deferring sockets in
unp_mark().


# 1.38 21-Dec-1998 thorpej

In unp_internalize(), add a comment explaining why we must ALIGN() the
data after the cmsghdr when accessing internalized SCM_RIGHTS messages
(i.e. array of struct file *s). The historic interface does not align
the externalized SCM_RIGHTS messages (i.e. array of ints).


# 1.37 21-Dec-1998 thorpej

Fix a fencepost error in unp_scan() which caused a bad pointer deref on
the SPARC platform only (ILP32 but ALIGNBYTES of 7), due to a missing
ALIGN().


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.36 04-Aug-1998 perry

Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)


# 1.35 31-Jul-1998 perry

fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.


Revision tags: eeh-paddr_t-base
# 1.34 18-Jul-1998 lukem

branches: 1.34.2;
use AF_LOCAL instead of AF_UNIX


# 1.33 16-Jul-1998 thorpej

Back out previous, I botched something.


# 1.32 10-Jul-1998 thorpej

For SOCK_STREAM, provide the socket credentials to the accepter as soon as
the client connects.


# 1.31 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.30 07-Jan-1998 thorpej

Implement passing credentials as ancillary data on Unix domain sockets,
enabled with the LOCAL_CREDS socket option on the listener. Semantics are
similar to BSD/OS's:
- Creds are available with first data on SOCK_STREAM, and with every datagram
on SOCK_DGRAM.
- It is not possible to forge credentials.

Different in that:
- Different credential data structure (ours does not rely on the format
of internal kernel data structures, and does not pass the login name).
- We can pass creds and file descriptors at the same time (this does not
work in BSD/OS).

Luke Mewburn <lukem@netbsd.org> gets credit for inspiring me to implement
this. :-)


# 1.29 07-Jan-1998 thorpej

Fix passing of multiple file descriptors (was broken when code was made
64-bit safe).


Revision tags: netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.28 17-Oct-1997 christos

branches: 1.28.2;
PR/4280: Chris Jones: Sending more than one fd over AF_UNIX sockets causes
panic. Bug in the fd -> struct file * conversion...


Revision tags: thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.27 26-Jun-1997 thorpej

Several small changes to eliminate kludginess in dealing with unix domain
socket names:
- In unp_setsockaddr() and unp_setpeeraddr(), if the socket name can't
fit into a single mbuf, allocate enough external storage space to
hold it.
- In unp_bind() and unp_connect(), perform a similar operation, but allocate
one extra byte, and ensure that the pathname is nul-terminated.
Many thanks to enami tsugutomo <enami@cv.sony.co.jp> for the sanity
checking.


# 1.26 24-Jun-1997 thorpej

Eliminate use of dtom() in the handing of UNIX domain sockets. Add an
"unp_addrlen" member to the unpcb, and use it when copying the socket
name. This eliminates that last uses of dtom() in the system.


# 1.25 15-May-1997 kleink

When fstat(2)ing a file descriptor of a local communications domain socket,
fill the socket's creation time into the stat structure's st_[acm]time fields:
POSIX requires this behavior for pipe(2). N.B.: updating the st_[am]time fields
when reading/writing the pipe is neither required nor implemented, though.


# 1.24 10-Apr-1997 cgd

Internalize and externalize file descriptors being passed via local domain
socket control messages correctly, without assuming that sizeof(int) ==
sizeof(pointer). Fixes PR#3183.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.23 23-May-1996 mycroft

Oops. Add missing label.


# 1.22 23-May-1996 mycroft

We can only get a control mbuf for PRU_SEND or PRU_SENDOOB. Add diagnostic
code to panic in this case.


# 1.21 23-May-1996 mycroft

Make sure the control and data mbufs are freed in all cases.


# 1.20 23-May-1996 mycroft

Separate some code into separate functions.
Make unp_addr be a pointer to the sockaddr, not to the mbuf, as with raw
sockets.
Other minor cleanup.


# 1.19 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.18 09-Feb-1996 christos

branches: 1.18.4;
More proto fixes


# 1.17 04-Feb-1996 pk

unp_detach() return type botch.


# 1.16 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.15 17-Aug-1995 mycroft

so_pcb should be a void *.


# 1.14 16-Aug-1995 mycroft

Allocate PCBs with malloc(), more mgetclr(). Be more careful to free the
PCB after it's done with.


# 1.13 05-Apr-1995 mycroft

Add missing argument to closef().


# 1.12 13-Dec-1994 mycroft

LEASE_CHECK -> VOP_LEASE


# 1.11 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.10 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.9 08-Jun-1994 mycroft

Update to 4.4-Lite fs code.


# 1.8 04-May-1994 mycroft

Fix panic when closing a file descriptor on which access rights have been sent
but not received.


# 1.7 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.6 14-Sep-1993 mycroft

Fix from Noriyuki Soda <soda@sra.co.jp>:
recvmsg(2) always returns -1 with errno==EMSGSIZE, when trying
to pass file descriptors through UNIX domain socket.


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.5 27-Jun-1993 andrew

branches: 1.5.4;
ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.4 12-Jun-1993 andrew

Yuval Yarom's 'panic:closef: count < 0' fix to unp_discard().


# 1.3 22-May-1993 cgd

add include of select.h if necessary for protos, or delete if extraneous


# 1.2 18-May-1993 cgd

make kernel select interface be one-stop shopping & clean it all up.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.199 26-Aug-2020 christos

add socket info for user and group for unix sockets in fstat.


Revision tags: bouyer-xenpvh-base2
# 1.198 21-Apr-2020 ad

Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.


Revision tags: phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.197 23-Feb-2020 ad

branches: 1.197.4;
Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


# 1.196 01-Feb-2020 riastradh

Load struct fdfile::ff_file with atomic_load_consume.

Exceptions: when we're only testing whether it's there, not about to
dereference it.

Note: We do not use atomic_store_release to set it because the
preceding mutex_exit should be enough.

(That said, it's not clear the mutex_enter/exit is needed unless
refcnt > 0 already, in which case maybe it would be a win to switch
from the membar implied by mutex_enter to the membar implied by
atomic_store_release -- which I would generally expect to be much
cheaper. And a little clearer without a long comment.)


# 1.195 01-Feb-2020 riastradh

Load struct filedesc::fd_dt with atomic_load_consume.

Exceptions: when fd_refcnt <= 1, or when holding fd_lock.

While here:

- Restore KASSERT(mutex_owned(&fdp->fd_lock)) in fd_unused.
=> This is used only in fd_close and fd_abort, where it holds.
- Move bounds check assertion in fd_putfile to where it matters.
- Store fd_dt with atomic_store_release.
- Move load of fd_dt under lock in knote_fdclose.
- Omit membar_consumer in fdesc_readdir.
=> atomic_load_consume serves the same purpose now.
=> Was needed only on alpha anyway.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 ad-namecache-base2 ad-namecache-base1 ad-namecache-base netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base
# 1.194 29-Jul-2019 maxv

branches: 1.194.4;
Fix info leak: the padding after the header causes uninitialized heap
memory to be copied to userland in sys_recvmsg().


Revision tags: phil-wifi-20190609
# 1.193 03-Jun-2019 msaitoh

Fix typo in comment (s/seperate/separate/).


Revision tags: isaki-audio2-base
# 1.192 01-Mar-2019 pgoyette

Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.

NFCI intended.

Ride the earlier kernel bump - it;s getting crowded.


# 1.191 20-Feb-2019 pgoyette

compat70_ocreds_valid is not a pointer to a boolean, it is the boolean
itself which controls whether or not we recognize the OCRED options.

Should fix the panic identified in PR kern/53991 (awaiting confirmation
from submitter).


# 1.190 04-Feb-2019 mrg

add or adjust fallthru comments.


# 1.189 29-Jan-2019 pgoyette

Normalize all the compat hooks' names to the form

<subsystem>_<function>_<version>_hook

NFCI

XXX Note that although this introduces a change in the kernel-to-
XXX module interface, we are NOT bumping the kernel version number.
XXX We will bump the version number once the interface stabilizes.


# 1.188 27-Jan-2019 pgoyette

Merge the [pgoyette-compat] branch


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.187 08-Nov-2018 roy

Don't call soroverflow when we return the error to the sender.

Thanks to thorpej@ for a sanity check.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.186 11-May-2018 roy

branches: 1.186.2;
Increase the default size of some receive buffers from 8k to 16k.
This mitigates recent reports of socket overflow errors
and fixes PR bin/53247.


# 1.185 05-May-2018 christos

bump PIPSIZ from 4 to 8K like FreeBSD and provide the same sysctls


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.184 19-Mar-2018 roy

socket: report receive buffer overflows

Add soroverflow() which increments the overflow counter, sets so_error
to ENOBUFS and wakes the receive socket up.
Replace all code that manually increments this counter with soroverflow().
Add soroverflow() to raw_input().

This allows userland to detect route(4) overflows so it can re-sync
with the current state.


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.183 17-Feb-2018 christos

branches: 1.183.2;
fix LOCAL_PEEREID to not return the same info for both sides...
XXX: pullup-{7,8}


Revision tags: tls-maxphys-base-20171202
# 1.182 02-Dec-2017 mrg

include opt_compat_netbsd.h, so that eg COMPAT_70 will be set.


Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.181 31-Oct-2016 maxv

branches: 1.181.8;
Memory leak, found by Mootja. It is easily triggerable from userland.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.180 06-Apr-2016 roy

branches: 1.180.2;
Add sc_pid to sockcred so that SOCK_DGRAM and LOCAL_CREDS socket option
can learn the process id of the AF_LOCAL sender.
Add compat glue for old structure.


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606
# 1.179 02-May-2015 rtr

make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}


# 1.178 26-Apr-2015 rtr

remove pr_generic from struct pr_usrreqs and all implementations of
pr_generic in protocols.

bump to 7.99.13

approved by rmind@


# 1.177 24-Apr-2015 rtr

make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19


Revision tags: nick-nhusb-base-20150406
# 1.176 03-Apr-2015 rtr

* change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@


# 1.175 01-Mar-2015 christos

PR/39918: ITOH Yasufumi: Replace KASSERT with continue, since the file
descriptor can be closed since closef() does not pay attention to FDEFER.
XXX: Pullup-7


# 1.174 28-Feb-2015 rtr

Place opening brace to function at column 0, like in the rest of the file.


# 1.173 02-Feb-2015 christos

Handle LOCAL_PEERID for socketpair() connected sockets which connect through
connect2().
1. move the code that sets the peerid structure into connect1(). This
handles so2. The datagram code calls connect2 twice with flipped
so arguments so both sockets get set.
2. in connect2 copy the peerid structure from so2 to so, so that that
both stream sockets get set.


Revision tags: nick-nhusb-base
# 1.172 08-Oct-2014 taca

branches: 1.172.2;
Make behavior of getsockname(2) (and maybe getpeername(2)) as the same as
NetBSD 6.1_STABLE and other operating system (OS X 10.9.5).

* sa_len of sockaddr_un strucrure is always set to sizeof(sun_path).
* pathname stored in sun_path is alwasys '\0' terminated (except length
of sun_path is sizeof(sun_path)?).

Should be fix PR kern/49247, runtime problem of lmtp service of dovecot2 on
NetBSD current and NetBSD 7.0_BETA.


# 1.171 05-Sep-2014 matt

Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.


# 1.170 05-Sep-2014 matt

Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 09-Aug-2014 rtr

branches: 1.169.2;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@


# 1.168 08-Aug-2014 rtr

split PRU_RCVD function out of pr_generic() usrreq switches and put into
separate functions

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_RCVD with calls to
pr_rcvd()


# 1.167 05-Aug-2014 rtr

actually use the passed in struct lwp *l instead of curlwp in unp_connect()


# 1.166 05-Aug-2014 rtr

split PRU_SEND function out of pr_generic() usrreq switches and put into
separate functions

xxx_send(struct socket *, struct mbuf *, struct mbuf *,
struct mbuf *, struct lwp *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_SEND with calls to
pr_send()

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_send() PRUs

- l2cap_send() -> l2cap_send_pcb()
- sco_send() -> sco_send_pcb()
- rfcomm_send() -> rfcomm_send_pcb()

patch reviewed by rmind


# 1.165 05-Aug-2014 rtr

revert the removal of struct lwp * parameter from bind, listen and connect
user requests.

this should resolve the issue relating to nfs client hangs presented
recently by wiz on current-users@


# 1.164 31-Jul-2014 rtr

* remove declarations of unp_bind, unp_discard, unp_disconnect1, unp_drop,
unp_shutdown1, unp_internalize and unp_output functions from sys/un.h
and instead declare them as static in uipc_usrreq.c with prototype
declarations as necessary.

* remove struct lwp * parameter from unp_output() while here and just
use curlwp instead.

as discussed with rmind


# 1.163 31-Jul-2014 rtr

split PRU_DISCONNECT, PRU_SHUTDOWN and PRU_ABORT function out of
pr_generic() usrreq switches and put into separate functions

xxx_disconnect(struct socket *)
xxx_shutdown(struct socket *)
xxx_abort(struct socket *)

- always KASSERT(solocked(so)) even if not implemented
- replace calls to pr_generic() with req =
PRU_{DISCONNECT,SHUTDOWN,ABORT}
with calls to pr_{disconnect,shutdown,abort}() respectively

rename existing internal functions used to implement above functionality
to permit use of the names for xxx_{disconnect,shutdown,abort}().

- {l2cap,sco,rfcomm}_disconnect() ->
{l2cap,sco,rfcomm}_disconnect_pcb()
- {unp,rip,tcp}_disconnect() -> {unp,rip,tcp}_disconnect1()
- unp_shutdown() -> unp_shutdown1()

patch reviewed by rmind


# 1.162 30-Jul-2014 rtr

split PRU_CONNECT function out of pr_generic() usrreq switches and put
into seaparate functions

xxx_listen(struct socket *, struct mbuf *)

- always KASSERT(solocked(so)) and KASSERT(nam != NULL)
- replace calls to pr_generic() with req = PRU_CONNECT with
pr_connect()
- rename existin {l2cap,sco,rfcomm}_connect() to
{l2cap,sco,rfcomm}_connect_pcb() respectively to permit
naming consistency with other protocols functions.
- drop struct lwp * parameter from unp_connect() and at_pcbconnect()
and use curlwp instead where appropriate.

patch reviewed by rmind


# 1.161 24-Jul-2014 rtr

split PRU_BIND and PRU_LISTEN function out of pr_generic() usrreq
switches and put into separate functions
xxx_bind(struct socket *, struct mbuf *)
xxx_listen(struct socket *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_BIND with call to
pr_bind()

- replace calls to pr_generic() with req = PRU_LISTEN with call to
pr_listen()

- drop struct lwp * parameter from at_pcbsetaddr(), in_pcbbind() and
unp_bind() and always use curlwp.

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_{bind,listen}() PRUs

- l2cap_{bind,listen}() -> l2cap_{bind,listen}_pcb()
- sco_{bind,listen}() -> sco_{bind,listen}_pcb()
- rfcomm_{bind,listen}() -> rfcomm_{bind,listen}_pcb()

patch reviewed by rmind

welcome to netbsd 6.99.48


# 1.160 23-Jul-2014 rtr

split PRU_SENDOOB and PRU_RCVOOB function out of pr_generic() usrreq
switches and put into separate functions
xxx_sendoob(struct socket *, struct mbuf *, struct mbuf *)
xxx_recvoob(struct socket *, struct mbuf *, int)

- always KASSERT(solocked(so)) even if request is not implemented

- replace calls to pr_generic() with req = PRU_{SEND,RCV}OOB with
calls to pr_{send,recv}oob() respectively.

there is still some tweaking of m_freem(m) and m_freem(control) to come
for consistency. not performed with this commit for clarity.

reviewed by rmind


# 1.159 09-Jul-2014 rtr

* split PRU_ACCEPT function out of pr_generic() usrreq switches and put
into a separate function xxx_accept(struct socket *, struct mbuf *)

note: future cleanup will take place to remove struct mbuf parameter
type and replace it with a more appropriate type.

patch reviewed by rmind


# 1.158 09-Jul-2014 rtr

* split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47


# 1.157 07-Jul-2014 rtr

* sprinkle KASSERT(solocked(so)); in all pr_stat() functions.
* fix remaining inconsistent struct socket parameter names.


# 1.156 06-Jul-2014 rtr

* split PRU_SENSE functionality out of xxx_usrreq() switches and place into
separate xxx_stat(struct socket *, struct stat *) functions.
* replace calls using pr_generic with req == PRU_SENSE with pr_stat().

further change will follow that cleans up the pattern used to extract the
pcb and test for its presence.

reviewed by rmind


# 1.155 01-Jul-2014 rtr

fix parameter types in pr_ioctl, called xx_control() functions and remove
abuse of pointer to struct mbuf type.

param2 changed to u_long type and uses parameter name 'cmd' (ioctl command)
param3 changed to void * type and uses parameter name 'data'
param4 changed to struct ifnet * and uses parameter name 'ifp'
param5 has been removed (formerly struct lwp *) and uses of 'l' have been
replaced with curlwp from curproc(9).

callers have had (now unnecessary) casts to struct mbuf * removed, called
code has had (now unnecessary) casts to u_long, void * and struct ifnet *
respectively removed.

reviewed by rmind@


# 1.154 22-Jun-2014 rtr

* split PRU_CONTROL functionality out of xxx_userreq() switches and place
into separate xxx_ioctl() functions.
* place KASSERT(req != PRU_CONTROL) inside xxx_userreq() as it is now
inappropriate for req = PRU_CONTROL in xxx_userreq().
* replace calls to pr_generic() with req = PRU_CONTROL with pr_ioctl().
* remove & fixup references to PRU_CONTROL xxx_userreq() function comments.
* fix various comments references for xxx_userreq() that mentioned
PRU_CONTROL as xxx_userreq() no longer handles the request.

a further change will follow to fix parameter and naming inconsistencies
retained from original code.

Reviewed by rmind@


# 1.153 08-Jun-2014 christos

Handle race where the server closed the socket between us 'connecting' and
sending data.


# 1.152 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.151 18-May-2014 rmind

Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.150 23-Jan-2014 hannken

branches: 1.150.2;
Change vnode operations create, mknod, mkdir and symlink to return
the resulting vnode *vpp unlocked.

Discussed on tech-kern@

Welcome to 6.99.30


# 1.149 17-Jan-2014 hannken

Change vnode operations create, mknod, mkdir and symlink to keep the
directory node dvp locked on return.

Discussed on tech-kern@

Welcome to 6.99.29


# 1.148 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.147 25-Oct-2013 martin

Eliminat an unused variable and simplify the KASSERT which used to use it


# 1.146 08-Oct-2013 christos

Centralize the sockaddr_un allocation code. Set sun_len appropriately so
that the address length returned is correct, not always 106. Note that
we do things slightly differently than linux and explain why. Unit-tests
to come.


# 1.145 08-Oct-2013 christos

- Instead of having accept(2) return a zero-filled sockaddr for the case
where accept(2) was called on a unix socket that called connect(2) and
then close(2), before the connection was accepted, return the empty
sockaddr_un.
- Fix the length of the empty sockaddr_un socket so that it reflects reality.


# 1.144 29-Aug-2013 rmind

Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.


# 1.143 01-Aug-2013 drochner

In unp_externalize, don't do anything if an SCM_RIGHTS control message
was sent with zero file descriptors in it. Otherwise, a zero-length
temporary storage would be allocated which triggers panic on DIAGNOSTIC
kernels (but is harmless for release kernels).
reviewed by Taylor R Campbell


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.142 27-Jun-2013 christos

branches: 1.142.2;
use sbcreatecontrol1() and m_add() instead of open-coding everything, and
getting it slightly wrong.


Revision tags: agc-symver-base
# 1.141 14-Feb-2013 riastradh

Fix some screw cases in cmsg file descriptor passing.

- Don't leave garbage in the control buffer if allocating file
descriptors fails in unp_externalize.

- Scrub the space between CMSG_LEN and CMSG_SPACE to avoid kernel
memory disclosure in unp_externalize.

- Don't read past cmsg_len when closing file descriptors that
couldn't get delivered, in free_rights.

ok christos


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.140 06-Oct-2012 christos

Avoid crash dereferencing a NULL fp in fd_affix() in unp_externalize
caused by the sequence of passing two fd's with two sendmsg()'s,
then doing a read() and a recvmsg(). The read() calls dom_dispose()
which discards both messages in the mbuf, and sets the fp's in the
array to NULL. Linux dequeues only one message per read() so the
second recvmsg() gets the fd from the second message. This fix
just avoids the NULL pointer de-reference, making the second
recvmsg() to fail. It is dubious to pass fd's with stream sockets
and expect mixing read() and recvmsg() to work. Plus processing
one control message per read() changes the current semantics and
should be examined before applied. In addition there is a race between
dom_externalize() and dom_dispose(): what happens in a multi-threaded
network stack when one thread disposes where the other externalizes
the same array?

NB: Pullup to 6.


# 1.139 30-Jul-2012 christos

branches: 1.139.2;
remove infinite loop on error, extra parens on return.


# 1.138 30-Jul-2012 christos

simplify unp_externalize(), some from gimpy, some from me.


# 1.137 02-Jun-2012 martin

Stopgap fix for PR kern/46463: disallow passing of kqueue descriptors
via SCM_RIGHT anxiliary socket messages.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.136 26-Jun-2011 christos

branches: 1.136.2; 1.136.8;
* Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).

- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.

* Fix the system calls that take socklen_t arguments to actually do so.

* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).

* Bump libc version for the new syscalls.


# 1.135 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.134 29-May-2011 manu

branches: 1.134.2;
Add SOCK_SEQPACKET to PL_LOCAL sockets. Based on patch from Jesse Off,
submitted 8 years ago:
http://mail-index.netbsd.org/tech-kern/2003/04/14/0006.html


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.133 19-Nov-2010 dholland

branches: 1.133.2;
Introduce struct pathbuf. This is an abstraction to hold a pathname
and the metadata required to interpret it. Callers of namei must now
create a pathbuf and pass it to NDINIT (instead of a string and a
uio_seg), then destroy the pathbuf after the namei session is
complete.

Update all namei call sites accordingly. Add a pathbuf(9) man page and
update namei(9).

The pathbuf interface also now appears in a couple of related
additional places that were passing string/uio_seg pairs that were
later fed into NDINIT. Update other call sites accordingly.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3
# 1.132 21-Oct-2010 yamt

unp_connect: fix an assertion


# 1.131 21-Oct-2010 yamt

unp_connect2: fix a comment.


Revision tags: yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.130 24-Jun-2010 hannken

Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.129 09-Feb-2010 wiz

branches: 1.129.2;
Fix typo in comment.


Revision tags: uebayasi-xip-base
# 1.128 08-Jan-2010 pooka

branches: 1.128.2;
The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change


Revision tags: matt-premerge-20091211 yamt-nfs-mp-base8 jym-xensuspend-nbase
# 1.127 26-Aug-2009 bouyer

In uipc_usrreq(PRU_ACCEPT), grab the unp_streamlock before unp_setpeerlocks().
This fixes a race where, for a short period of time, so->so_lock and
so2->so_lock are not sync. This makes solocked2() and solocked()
unreliable and cause DIAGNOSTIC kernel panics. This also fixes a possible
panic in unp_setaddr() which expects the socket locked.
Should fix kern/38968, fix proposed in
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005863.html


Revision tags: yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.126 24-May-2009 ad

More changes to improve kern_descrip.c.

- Avoid atomics in more places.
- Remove the per-descriptor mutex, and just use filedesc_t::fd_lock.
It was only being used to synchronize close, and in any case we needed
to take fd_lock to free the descriptor slot.
- Optimize certain paths for the <NDFDFILE case.
- Sprinkle more comments and assertions.
- Cache more stuff in filedesc_t.
- Fix numerous minor bugs spotted along the way.
- Restructure how the open files array is maintained, for clarity and so
that we can eliminate the membar_consumer() call in fd_getfile(). This is
mostly syntactic sugar; the main functional change is that fd_nfiles now
lives alongside the open file array.

Some measurements with libmicro:

- simple file syscalls are like close() are between 1 to 10% faster.
- some nice improvements, e.g. poll(1000) which is ~50% faster.


Revision tags: yamt-nfs-mp-base4 yamt-nfs-mp-base3 jym-xensuspend-base
# 1.125 04-May-2009 yamt

tweak some assertions on so_head to make them more meaningful.


Revision tags: nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.124 09-Apr-2009 yamt

0 -> NULL


# 1.123 09-Apr-2009 yamt

remove an unnecessary cast.


# 1.122 09-Apr-2009 yamt

0 -> NULL where appropriate


# 1.121 11-Mar-2009 mrg

completely rework the way that orphaned sockets that are being fdpassed
via SCM_RIGHTS messages are dealt with:

1. unp_gc: make this a kthread.

2. unp_detach: go not call unp_gc directly. instead, wake up unp_gc kthread.

3. unp_scan: do not close files here. instead, put them on a global list
for unp_gc to close, along with a per-file "deferred close count". if
file is already enqueued for close, just increment deferred close count.
this eliminates the recursive calls.

3. unp_gc: scan files on global deferred close list. close each file N
times, as specified by deferred close count in file. continue processing
list until it becomes empty (closing may cause additional files to be
queued for close).

4. unp_gc: add additional bit to mark files we are scanning. set during
initial scan of global file list that currently clears FMARK/FDEFER.
during later scans, never examine / garbage collect descriptors that
we have not marked during the earlier scan. do not proceed with this
initial scan until all deferred closes have been processed. be careful
with locking to ensure no races are introduced between deferred close
and file scan.

5. unp_gc: use dummy file_t to mark position in list when scanning. allow
us to drop filelist_lock. in turn allows us to eliminate kmem_alloc()
and safely close files, etc.

6. prohibit transfer of descriptors within SCM_RIGHTS messages if
(num_files_in_transit > maxfiles / unp_rights_ratio)

7. fd_allocfile: ensure recycled filse don't get scanned.


this is 97% work done by andrew doran, with a couple of minor bug fixes
and a lot of testing by yours truly.


Revision tags: nick-hppapmap-base2
# 1.120 08-Feb-2009 pooka

branches: 1.120.2;
Don't try to fd_putfile() descriptors we didn't manage to fd_getfile().

Fixes local DoS panic described in kern/40570.


Revision tags: netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.119 11-Oct-2008 pooka

branches: 1.119.2; 1.119.4;
Move uidinfo to its own module in kern_uidinfo.c and include in rump.
No functional change to uidinfo.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.118 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.117 20-Jun-2008 christos

branches: 1.117.2;
Also enforce that cm->cmsg_len >= CMSG_ALIGN(sizeof cmsghdr), from
Michael van Elst


# 1.116 20-Jun-2008 christos

Don't require cm->cmsg_len == control->m_len, just that the cm->cmsg_len
<= control->m_len, like FreeBSD does. Idea from Taylor R Campbell.


Revision tags: yamt-pf42-base4
# 1.115 10-Jun-2008 ad

There can be existing waiters on a socket's condition variables when we
change socket::so_lock, and they rely on the old lock to synchronize.
Wake them up whenever we change so_lock so they can restart their waits.


Revision tags: yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.114 28-Apr-2008 martin

branches: 1.114.2; 1.114.4;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.113 27-Apr-2008 ad

branches: 1.113.2;
Add a comment.


# 1.112 24-Apr-2008 ad

Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.111 20-Apr-2008 mlelstv

When unp_internalize fails (due to the sanity check or an out-of-memory
condition), it leaves the control message with file descriptors. Calling
unp_dispose() will interpret the message as containing file pointers
and crash the system.
This change removes unp_dispose() from this failure path and avoids
using goto to jump into switch statements...
The previous workaround to ignore such messages in unp_scan() is removed.


# 1.110 19-Apr-2008 mjf

If cm->cmsg_len is not valid for unp_internalize do not use it to work out
where the data is in unp_scan.

Fixes PR/38391


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.109 28-Mar-2008 ad

branches: 1.109.2;
Prevent overlapping calls to bind() and/or connect() on a Unix socket.


Revision tags: ad-socklock-base1
# 1.108 24-Mar-2008 yamt

merge yamt-lazymbuf branch.


Revision tags: yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.107 21-Mar-2008 rmind

unp_gc: unlock filelist_lock in a case of restart.


# 1.106 21-Mar-2008 ad

Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.105 25-Jan-2008 ad

branches: 1.105.6;
Remove VOP_LEASE. Discussed on tech-kern.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.104 05-Jan-2008 dsl

Use FILE_LOCK() and FILE_UNLOCK()


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base jmcneill-pm-base
# 1.103 08-Dec-2007 pooka

branches: 1.103.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.


Revision tags: vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase reinoud-bufcleanup-base
# 1.102 26-Nov-2007 pooka

branches: 1.102.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern


Revision tags: jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 vmlocking-base
# 1.101 08-Oct-2007 ad

branches: 1.101.4;
Merge file descriptor locking, cwdi locking and cross-call changes
from the vmlocking branch.


Revision tags: yamt-x86pmap-base2 yamt-x86pmap-base
# 1.100 19-Sep-2007 dyoung

branches: 1.100.2;
Make uipc_ctloutput() return ENOPROTOOPT instead of EINVAL when it
is passed a handle socket-option level that it does not care about.


Revision tags: nick-csl-alignment-base5
# 1.99 09-Aug-2007 he

branches: 1.99.2;
Add a new socket option for unix domain sockets: LOCAL_PEEREID, to make
it possible to get the pid, euid and egid of the process at the remote
end at the time it did bind() or connect().

Add a new libc function, getpeereid() to easily get at the euid and egid.
As a consequence, bump libc's minor number.

Document the LOCAL_PEEREID socket option in unix(4).

Based on contribution by Arne H. Juul, minor modifications by myself.


Revision tags: matt-mips64-base
# 1.98 03-Aug-2007 martin

branches: 1.98.2;
PR kern/32842:
do not leak file descriptors when sending a datagram with SCM_RIGHTS
fails. Patch from Gary Thorpe, based on changes in FreeBSD and work
from Christian Biere.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.97 22-Apr-2007 dsl

branches: 1.97.2; 1.97.6;
Change the way that emulations locate files within the emulation root to
avoid having to allocate space in the 'stackgap'
- which is very LWP unfriendly.
The additional code for non-emulation namei() is trivial, the reduction for
the emulations is massive.
The vnode for a processes emulation root is saved in the cwdi structure
during process exec.
If the emulation root the TRYEMULROOT flag are set, namei() will do an initial
search for absolute pathnames in the emulation root, if that fails it will
retry from the normal root.
".." at the emulation root will always go to the real root, even in the middle
of paths and when expanding symlinks.
Absolute symlinks found using absolute paths in the emulation root will be
relative to the emulation root (so /usr/lib/xxx.so -> /lib/xxx.so links
inside the emulation root don't need changing).
If the root of the emulation would be returned (for an emulation lookup), then
the real root is returned instead (matching the behaviour of emul_lookup,
but being a cheap comparison here) so that programs that scan "../.."
looking for the root dircetory don't loop forever.
The target for symbolic links is no longer mangled (it used to get the
CHECK_ALT_xxx() treatment, so could get /emul/xxx prepended).
CHECK_ALT_xxx() are no more. Most of the change is deleting them, and adding
TRYEMULROOT to the flags to NDINIT().
A lot of the emulation system call stubs could now be deleted.


Revision tags: thorpej-atomic-base
# 1.96 03-Apr-2007 hannken

Remove calls to now obsolete vn_start_write() and vn_finished_write().


# 1.95 04-Mar-2007 christos

branches: 1.95.2; 1.95.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.94 01-Nov-2006 cbiere

branches: 1.94.2; 1.94.4; 1.94.8;
Pointing one element past an array is fine, pointing before it isn't.


Revision tags: yamt-splraiseipl-base2 yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.93 03-Sep-2006 christos

branches: 1.93.2; 1.93.4;
use c99 initializers


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base8 yamt-pdpolicy-base7
# 1.92 23-Jul-2006 ad

Use the LWP cached credentials where sane.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.91 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.90 14-Apr-2006 christos

Coverity CID 1089: Add more KASSERTs to prevent NULL deref.


# 1.89 14-Apr-2006 christos

Coverity CID 1088: Add KASSERT to prevent NULL pointer deref.


# 1.88 13-Apr-2006 matt

Add a KASSERT to document a condition for the PRU_ABORT case.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.87 01-Mar-2006 christos

branches: 1.87.2; 1.87.4; 1.87.6;
PR/32856: Christian Biere: Don't panic if you send a control message with
SCM_RIGHTS on an unconnected stream socket.


# 1.86 11-Dec-2005 christos

branches: 1.86.2; 1.86.4; 1.86.6;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base ktrace-lwp-base
# 1.85 11-Nov-2005 simonb

Call nanotime() directly, instead of doing the
microtime()/TIMEVAL_TO_TIMESPEC() dance.


Revision tags: yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.84 30-Aug-2005 jmmv

Honor the user's umask while creating local sockets. Several other systems
do already this (such as FreeBSD, OpenBSD and Linux), so it will improve
portability of some third-party programs. No objections in tech-kern@.


# 1.83 16-Jun-2005 yamt

branches: 1.83.2;
uipc_usrreq: plug mbuf leak.


# 1.82 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


# 1.81 07-May-2005 christos

PR/30154: YAMAMOTO Takashi: tcp_close locking botch
chgsbsize() as mentioned in the PR can be called from an interrupt context
via tcp_close(). Avoid calling uid_find() in chgsbsize().
- Instead of storing so_uid in struct socketvar, store *so_uidinfo
- Add a simple lock to struct uidinfo.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.80 26-Feb-2005 perry

branches: 1.80.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.79 03-Sep-2004 darrenr

branches: 1.79.4; 1.79.6;
add a per-socket counter for dropped UDP packets when the internal buffers
are full.


# 1.78 22-May-2004 jonathan

Eliminate several uses of `curproc' from the socket-layer code and from NFS.

Add a new explicit `struct proc *p' argument to socreate(), sosend().
Use that argument instead of curproc. Follow-on changes to pass that
argument to socreate(), sosend(), and (*so->so_send)() calls.
These changes reviewed and independently recoded by Matt Thomas.

Changes to soreceive() and (*dom->dom_exernalize() from Matt Thomas:
pass soreceive()'s struct uio* uio->uio_procp to unp_externalize().
Eliminate curproc from unp_externalize. Also, now soreceive() uses
its uio->uio_procp value, pass that same value downward to
((pr->pru_usrreq)() calls for consistency, instead of (struct proc * )0.

Similar changes in sys/nfs to eliminate (most) uses of curproc,
either via the req-> r_procp field of a struct nfsreq *req argument,
or by passing down new explicit struct proc * arguments.

Reviewed by: Matt Thomas, posted to tech-kern.
NB: The (*pr->pru_usrreq)() change should be tested on more (all!) protocols.


# 1.77 18-Apr-2004 matt

Constify sun_noname.


# 1.76 18-Apr-2004 matt

ANSI'fy.


# 1.75 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.74 23-Mar-2004 junyoung

branches: 1.74.2; 1.74.4;
Nuke __P().


# 1.73 29-Dec-2003 martin

Avoid using m_clget() on a mbuf already in use, especially when we
need the data in the mbuf later and m_clget() changes some fields
overlaid to regular mbuf data. Instead, rearange code a bit, create
data into a new allocated buffer and and use MEXTADD to attach it to
the mbuf, if the mbuf internal space is not sufficient.

This fixes a crash on sparc64 (and probably all other archs where
sizeof(int) != sizeof(struct file *)) when running
regress/sys/kern/unfdpass.

Idea for solution from Matt Thomas, with additional input from YAMAMOTO
Takashi.


# 1.72 29-Nov-2003 matt

Restore a change that made AF_LOCAL sockets block on connect(2) until
accepted. However, this time this behavor is not the default. Instead
it must enabled by using the LOCAL_CONNWAIT socket option on either the
connecting or accepting socket.


# 1.71 29-Nov-2003 perry

Revert a change that altered the semantics of AF_LOCAL sockets. Sadly
this made us API incompatible with other Unixes.


# 1.70 15-Oct-2003 hannken

Add the gating of system calls that cause modifications to the underlying
file system.
The function vfs_write_suspend stops all new write operations to a file
system, allows any file system modifying system calls already in progress
to complete, then sync's the file system to disk and returns. The
function vfs_write_resume allows the suspended write operations to
complete.

From FreeBSD with slight modifications.

Approved by: Frank van der Linden <fvdl@netbsd.org>


# 1.69 03-Sep-2003 matt

Fix typo.


# 1.68 03-Sep-2003 matt

Change the behavor of AF_LOCAL connect() to sleep until the server has
accepted the connection. This can prevent a client from overwhelming a
server.


# 1.67 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.66 24-Jul-2003 jdolecek

back rev 1.63 (the linux hack) off - no compat specific code
in generic code, please

we need to massage the passed linux cmsg anyway, linux uses different
alignment for CMSG_DATA on at least some architectures


# 1.65 23-Jul-2003 itojun

backout previous, there was a comment on LINUX_SOL_SOCKET=1


# 1.64 23-Jul-2003 itojun

#define LINUX_SOL_SOCKET 1, so that we can answer "what the hell is this 1?"
at ease.


# 1.63 23-Jul-2003 christos

From Todd Vierling: Accept level == 1 for linux compat.


# 1.62 29-Jun-2003 fvdl

branches: 1.62.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.61 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.60 10-Apr-2003 christos

RP/21088: Jesse Off: Return ENOBUFS instead of EINVAL when sbappend fails.


# 1.59 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


# 1.58 25-Feb-2003 pk

Fix a simple_lock() mismatch in unp_internalize().
We may need to merge the passes over the files contained in the message
as noted by enami tsugutomo on tech-smp.


# 1.57 23-Feb-2003 pk

Make updating a file's reference and use count MP-safe.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.56 25-Nov-2002 itojun

branches: 1.56.2;
no need for error check after MEXTMALLOC - jdolecek


# 1.55 25-Nov-2002 itojun

MEXTMALLOC() can fail even if M_WAITOK, if arg is too big for malloc().


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.54 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base gehenna-devsw-base eeh-devprop-base newlock-base ifpoll-base thorpej-mips-cache-base
# 1.53 12-Nov-2001 lukem

add RCSIDs


# 1.52 18-Oct-2001 thorpej

branches: 1.52.2;
Deprecate the "m_act" alias of "m_nextpkt" (m_act is a historical
name), and just use m_nextpkt everywhere.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.51 14-Jun-2001 thorpej

branches: 1.51.2;
Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.


# 1.50 07-Jun-2001 thorpej

Rework fdalloc() even further: split fdalloc() into fdalloc() and
fdexpand(). The former will return ENOSPC if there is not space
in the current filedesc table. The latter performs the expansion
of the filedesc table. This means that fdalloc() won't ever block,
and it gives callers an opportunity to clean up before the
potentially-blocking fdexpand() call.

Update all fdalloc() callers to deal with the need-to-fdexpand() case.

Rewrite unp_externalize() to use fdalloc() and fdexpand() in a
safe way, using an algorithm suggested by Bill Sommerfeld:
- Use a temporary array of integers to hold the new filedesc table
indexes. This allows us to repeat the loop if necessary.
- Loop through the array of file *'s, assigning them to filedesc table
slots. If fdalloc() indicates expansion is necessary, undo the
assignments we've done so far, expand, and retry the whole process.
- Once all file *'s have been assigned to slots, update the f_msgcount
and unp_rights counters.
- Right before we return, copy the temporary integer array to the message
buffer, and trim the length as before.
Note that once locking is added to the filedesc array, this entire
operation will be `atomic', in that the lock will be held while
file *'s are assigned to embryonic table slots, thus preventing anything
else from using them.


# 1.49 06-Jun-2001 thorpej

Change fdalloc() to return ERESTART if we had to reallocate the
descriptor array, which may have blocked. Change callers of
fdalloc() to restart whatever they\'re doing if this condition
happens. (XXX unp_externalize() needs some work, but that will
be tackled later.)

Change finishdup() to close the descriptor in the `new\' slot if
one exists, and change sys_dup2() accordingly.

Closes a race condition when using kernel-assisted user threads.

While here, garbage-collect UF_MAPPED -- it is not used anywhere.


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base thorpej_scsipi_base
# 1.48 05-Jun-2000 thorpej

branches: 1.48.2; 1.48.4;
Oops, missed a couple of places where CMSG_*() should be used. No
functional change in this case, but the code is now correct.


# 1.47 05-Jun-2000 thorpej

- Fix file descriptor passing AGAIN. This has apparently been broken
on LP64 systems (and probably the SPARC) since the __cmsg_alignbytes()
changes went in.
- Change file descriptor passing to use CMSG_DATA(), not (cm + 1). This
pretty much has to be done in order to make it work properly on LP64,
and considering that it's been broken this long...
- Use CMSG_SPACE() to determine the mbuf length needed for a given
control message, and CMSG_LEN() to stash in the cmsg_len member.


Revision tags: minoura-xpg4dl-base
# 1.46 30-Mar-2000 augustss

branches: 1.46.2;
Get rid of register declarations.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base chs-ubc2-base
# 1.45 17-Jun-1999 thorpej

branches: 1.45.2;
Um, hi, let's initialize pointers before we use them.


# 1.44 05-May-1999 thorpej

Add "use counting" to file entries. When closing a file, and it's reference
count is 0, wait for use count to drain before finishing the close.

This is necessary in order for multiple processes to safely share file
descriptor tables.


# 1.43 05-May-1999 thorpej

Fix alignment problem in the garbage-collection code path.


# 1.42 30-Apr-1999 thorpej

Break cdir/rdir/cmask info out of struct filedesc, and put it in a new
substructure, `cwdinfo'. Implement optional sharing of this substructure.

This is required for clone(2).


# 1.41 21-Apr-1999 mrg

revert previous. oops.


# 1.40 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: netbsd-1-4-base
# 1.39 22-Mar-1999 sommerfe

branches: 1.39.2;
Disallow descriptor-passing of descriptors which are open on
directories which aren't under the recipient's root.

Clean up of many error conditions involving descriptor passing, to
eliminate infinite loops, panics, premature garbage collection of
sockets, and descriptor leaks:
- Avoid letting unp_gc() see descriptors with a refcount of zero by
removing them from the socket's queue before releasing them.
- Avoid socket leak in PRU_ABORT (this will also gc descriptors queued
on a not-yet accepted socket when the accepting socket goes away).
- Put in block comment explaining how unp_gc() should work.
- Correctly manage unp_defer count so we don't get stuck in an infinite
loop with nothing to do.
- Don't tie MARK and DEFER bits so closely together.
- Mark descriptors queued on not-yet-accepted sockets as well.
- Don't call sorflush on non-socket, it doesn't work very well.
- Deal with discard of NULL file pointer.
- Hopefully cause GC to converge faster by only deferring sockets in
unp_mark().


# 1.38 21-Dec-1998 thorpej

In unp_internalize(), add a comment explaining why we must ALIGN() the
data after the cmsghdr when accessing internalized SCM_RIGHTS messages
(i.e. array of struct file *s). The historic interface does not align
the externalized SCM_RIGHTS messages (i.e. array of ints).


# 1.37 21-Dec-1998 thorpej

Fix a fencepost error in unp_scan() which caused a bad pointer deref on
the SPARC platform only (ILP32 but ALIGNBYTES of 7), due to a missing
ALIGN().


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.36 04-Aug-1998 perry

Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)


# 1.35 31-Jul-1998 perry

fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.


Revision tags: eeh-paddr_t-base
# 1.34 18-Jul-1998 lukem

branches: 1.34.2;
use AF_LOCAL instead of AF_UNIX


# 1.33 16-Jul-1998 thorpej

Back out previous, I botched something.


# 1.32 10-Jul-1998 thorpej

For SOCK_STREAM, provide the socket credentials to the accepter as soon as
the client connects.


# 1.31 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.30 07-Jan-1998 thorpej

Implement passing credentials as ancillary data on Unix domain sockets,
enabled with the LOCAL_CREDS socket option on the listener. Semantics are
similar to BSD/OS's:
- Creds are available with first data on SOCK_STREAM, and with every datagram
on SOCK_DGRAM.
- It is not possible to forge credentials.

Different in that:
- Different credential data structure (ours does not rely on the format
of internal kernel data structures, and does not pass the login name).
- We can pass creds and file descriptors at the same time (this does not
work in BSD/OS).

Luke Mewburn <lukem@netbsd.org> gets credit for inspiring me to implement
this. :-)


# 1.29 07-Jan-1998 thorpej

Fix passing of multiple file descriptors (was broken when code was made
64-bit safe).


Revision tags: netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.28 17-Oct-1997 christos

branches: 1.28.2;
PR/4280: Chris Jones: Sending more than one fd over AF_UNIX sockets causes
panic. Bug in the fd -> struct file * conversion...


Revision tags: thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.27 26-Jun-1997 thorpej

Several small changes to eliminate kludginess in dealing with unix domain
socket names:
- In unp_setsockaddr() and unp_setpeeraddr(), if the socket name can't
fit into a single mbuf, allocate enough external storage space to
hold it.
- In unp_bind() and unp_connect(), perform a similar operation, but allocate
one extra byte, and ensure that the pathname is nul-terminated.
Many thanks to enami tsugutomo <enami@cv.sony.co.jp> for the sanity
checking.


# 1.26 24-Jun-1997 thorpej

Eliminate use of dtom() in the handing of UNIX domain sockets. Add an
"unp_addrlen" member to the unpcb, and use it when copying the socket
name. This eliminates that last uses of dtom() in the system.


# 1.25 15-May-1997 kleink

When fstat(2)ing a file descriptor of a local communications domain socket,
fill the socket's creation time into the stat structure's st_[acm]time fields:
POSIX requires this behavior for pipe(2). N.B.: updating the st_[am]time fields
when reading/writing the pipe is neither required nor implemented, though.


# 1.24 10-Apr-1997 cgd

Internalize and externalize file descriptors being passed via local domain
socket control messages correctly, without assuming that sizeof(int) ==
sizeof(pointer). Fixes PR#3183.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.23 23-May-1996 mycroft

Oops. Add missing label.


# 1.22 23-May-1996 mycroft

We can only get a control mbuf for PRU_SEND or PRU_SENDOOB. Add diagnostic
code to panic in this case.


# 1.21 23-May-1996 mycroft

Make sure the control and data mbufs are freed in all cases.


# 1.20 23-May-1996 mycroft

Separate some code into separate functions.
Make unp_addr be a pointer to the sockaddr, not to the mbuf, as with raw
sockets.
Other minor cleanup.


# 1.19 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.18 09-Feb-1996 christos

branches: 1.18.4;
More proto fixes


# 1.17 04-Feb-1996 pk

unp_detach() return type botch.


# 1.16 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.15 17-Aug-1995 mycroft

so_pcb should be a void *.


# 1.14 16-Aug-1995 mycroft

Allocate PCBs with malloc(), more mgetclr(). Be more careful to free the
PCB after it's done with.


# 1.13 05-Apr-1995 mycroft

Add missing argument to closef().


# 1.12 13-Dec-1994 mycroft

LEASE_CHECK -> VOP_LEASE


# 1.11 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.10 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.9 08-Jun-1994 mycroft

Update to 4.4-Lite fs code.


# 1.8 04-May-1994 mycroft

Fix panic when closing a file descriptor on which access rights have been sent
but not received.


# 1.7 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.6 14-Sep-1993 mycroft

Fix from Noriyuki Soda <soda@sra.co.jp>:
recvmsg(2) always returns -1 with errno==EMSGSIZE, when trying
to pass file descriptors through UNIX domain socket.


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.5 27-Jun-1993 andrew

branches: 1.5.4;
ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.4 12-Jun-1993 andrew

Yuval Yarom's 'panic:closef: count < 0' fix to unp_discard().


# 1.3 22-May-1993 cgd

add include of select.h if necessary for protos, or delete if extraneous


# 1.2 18-May-1993 cgd

make kernel select interface be one-stop shopping & clean it all up.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.198 21-Apr-2020 ad

Revert the changes made in February to make cwdinfo use mostly lockless,
which relied on taking extra vnode refs.

Having benchmarked various experimental changes over the past few months it
seems that it's better to avoid vnode refs as much as possible. cwdi_lock
as a RW lock already did that to some extent for getcwd() and will permit
the same for namei() too.


Revision tags: phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3
# 1.197 23-Feb-2020 ad

Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


# 1.196 01-Feb-2020 riastradh

Load struct fdfile::ff_file with atomic_load_consume.

Exceptions: when we're only testing whether it's there, not about to
dereference it.

Note: We do not use atomic_store_release to set it because the
preceding mutex_exit should be enough.

(That said, it's not clear the mutex_enter/exit is needed unless
refcnt > 0 already, in which case maybe it would be a win to switch
from the membar implied by mutex_enter to the membar implied by
atomic_store_release -- which I would generally expect to be much
cheaper. And a little clearer without a long comment.)


# 1.195 01-Feb-2020 riastradh

Load struct filedesc::fd_dt with atomic_load_consume.

Exceptions: when fd_refcnt <= 1, or when holding fd_lock.

While here:

- Restore KASSERT(mutex_owned(&fdp->fd_lock)) in fd_unused.
=> This is used only in fd_close and fd_abort, where it holds.
- Move bounds check assertion in fd_putfile to where it matters.
- Store fd_dt with atomic_store_release.
- Move load of fd_dt under lock in knote_fdclose.
- Omit membar_consumer in fdesc_readdir.
=> atomic_load_consume serves the same purpose now.
=> Was needed only on alpha anyway.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 ad-namecache-base2 ad-namecache-base1 ad-namecache-base netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base
# 1.194 29-Jul-2019 maxv

branches: 1.194.4;
Fix info leak: the padding after the header causes uninitialized heap
memory to be copied to userland in sys_recvmsg().


Revision tags: phil-wifi-20190609
# 1.193 03-Jun-2019 msaitoh

Fix typo in comment (s/seperate/separate/).


Revision tags: isaki-audio2-base
# 1.192 01-Mar-2019 pgoyette

Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.

NFCI intended.

Ride the earlier kernel bump - it;s getting crowded.


# 1.191 20-Feb-2019 pgoyette

compat70_ocreds_valid is not a pointer to a boolean, it is the boolean
itself which controls whether or not we recognize the OCRED options.

Should fix the panic identified in PR kern/53991 (awaiting confirmation
from submitter).


# 1.190 04-Feb-2019 mrg

add or adjust fallthru comments.


# 1.189 29-Jan-2019 pgoyette

Normalize all the compat hooks' names to the form

<subsystem>_<function>_<version>_hook

NFCI

XXX Note that although this introduces a change in the kernel-to-
XXX module interface, we are NOT bumping the kernel version number.
XXX We will bump the version number once the interface stabilizes.


# 1.188 27-Jan-2019 pgoyette

Merge the [pgoyette-compat] branch


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.187 08-Nov-2018 roy

Don't call soroverflow when we return the error to the sender.

Thanks to thorpej@ for a sanity check.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.186 11-May-2018 roy

branches: 1.186.2;
Increase the default size of some receive buffers from 8k to 16k.
This mitigates recent reports of socket overflow errors
and fixes PR bin/53247.


# 1.185 05-May-2018 christos

bump PIPSIZ from 4 to 8K like FreeBSD and provide the same sysctls


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.184 19-Mar-2018 roy

socket: report receive buffer overflows

Add soroverflow() which increments the overflow counter, sets so_error
to ENOBUFS and wakes the receive socket up.
Replace all code that manually increments this counter with soroverflow().
Add soroverflow() to raw_input().

This allows userland to detect route(4) overflows so it can re-sync
with the current state.


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.183 17-Feb-2018 christos

branches: 1.183.2;
fix LOCAL_PEEREID to not return the same info for both sides...
XXX: pullup-{7,8}


Revision tags: tls-maxphys-base-20171202
# 1.182 02-Dec-2017 mrg

include opt_compat_netbsd.h, so that eg COMPAT_70 will be set.


Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.181 31-Oct-2016 maxv

branches: 1.181.8;
Memory leak, found by Mootja. It is easily triggerable from userland.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.180 06-Apr-2016 roy

branches: 1.180.2;
Add sc_pid to sockcred so that SOCK_DGRAM and LOCAL_CREDS socket option
can learn the process id of the AF_LOCAL sender.
Add compat glue for old structure.


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606
# 1.179 02-May-2015 rtr

make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}


# 1.178 26-Apr-2015 rtr

remove pr_generic from struct pr_usrreqs and all implementations of
pr_generic in protocols.

bump to 7.99.13

approved by rmind@


# 1.177 24-Apr-2015 rtr

make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19


Revision tags: nick-nhusb-base-20150406
# 1.176 03-Apr-2015 rtr

* change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@


# 1.175 01-Mar-2015 christos

PR/39918: ITOH Yasufumi: Replace KASSERT with continue, since the file
descriptor can be closed since closef() does not pay attention to FDEFER.
XXX: Pullup-7


# 1.174 28-Feb-2015 rtr

Place opening brace to function at column 0, like in the rest of the file.


# 1.173 02-Feb-2015 christos

Handle LOCAL_PEERID for socketpair() connected sockets which connect through
connect2().
1. move the code that sets the peerid structure into connect1(). This
handles so2. The datagram code calls connect2 twice with flipped
so arguments so both sockets get set.
2. in connect2 copy the peerid structure from so2 to so, so that that
both stream sockets get set.


Revision tags: nick-nhusb-base
# 1.172 08-Oct-2014 taca

branches: 1.172.2;
Make behavior of getsockname(2) (and maybe getpeername(2)) as the same as
NetBSD 6.1_STABLE and other operating system (OS X 10.9.5).

* sa_len of sockaddr_un strucrure is always set to sizeof(sun_path).
* pathname stored in sun_path is alwasys '\0' terminated (except length
of sun_path is sizeof(sun_path)?).

Should be fix PR kern/49247, runtime problem of lmtp service of dovecot2 on
NetBSD current and NetBSD 7.0_BETA.


# 1.171 05-Sep-2014 matt

Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.


# 1.170 05-Sep-2014 matt

Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 09-Aug-2014 rtr

branches: 1.169.2;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@


# 1.168 08-Aug-2014 rtr

split PRU_RCVD function out of pr_generic() usrreq switches and put into
separate functions

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_RCVD with calls to
pr_rcvd()


# 1.167 05-Aug-2014 rtr

actually use the passed in struct lwp *l instead of curlwp in unp_connect()


# 1.166 05-Aug-2014 rtr

split PRU_SEND function out of pr_generic() usrreq switches and put into
separate functions

xxx_send(struct socket *, struct mbuf *, struct mbuf *,
struct mbuf *, struct lwp *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_SEND with calls to
pr_send()

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_send() PRUs

- l2cap_send() -> l2cap_send_pcb()
- sco_send() -> sco_send_pcb()
- rfcomm_send() -> rfcomm_send_pcb()

patch reviewed by rmind


# 1.165 05-Aug-2014 rtr

revert the removal of struct lwp * parameter from bind, listen and connect
user requests.

this should resolve the issue relating to nfs client hangs presented
recently by wiz on current-users@


# 1.164 31-Jul-2014 rtr

* remove declarations of unp_bind, unp_discard, unp_disconnect1, unp_drop,
unp_shutdown1, unp_internalize and unp_output functions from sys/un.h
and instead declare them as static in uipc_usrreq.c with prototype
declarations as necessary.

* remove struct lwp * parameter from unp_output() while here and just
use curlwp instead.

as discussed with rmind


# 1.163 31-Jul-2014 rtr

split PRU_DISCONNECT, PRU_SHUTDOWN and PRU_ABORT function out of
pr_generic() usrreq switches and put into separate functions

xxx_disconnect(struct socket *)
xxx_shutdown(struct socket *)
xxx_abort(struct socket *)

- always KASSERT(solocked(so)) even if not implemented
- replace calls to pr_generic() with req =
PRU_{DISCONNECT,SHUTDOWN,ABORT}
with calls to pr_{disconnect,shutdown,abort}() respectively

rename existing internal functions used to implement above functionality
to permit use of the names for xxx_{disconnect,shutdown,abort}().

- {l2cap,sco,rfcomm}_disconnect() ->
{l2cap,sco,rfcomm}_disconnect_pcb()
- {unp,rip,tcp}_disconnect() -> {unp,rip,tcp}_disconnect1()
- unp_shutdown() -> unp_shutdown1()

patch reviewed by rmind


# 1.162 30-Jul-2014 rtr

split PRU_CONNECT function out of pr_generic() usrreq switches and put
into seaparate functions

xxx_listen(struct socket *, struct mbuf *)

- always KASSERT(solocked(so)) and KASSERT(nam != NULL)
- replace calls to pr_generic() with req = PRU_CONNECT with
pr_connect()
- rename existin {l2cap,sco,rfcomm}_connect() to
{l2cap,sco,rfcomm}_connect_pcb() respectively to permit
naming consistency with other protocols functions.
- drop struct lwp * parameter from unp_connect() and at_pcbconnect()
and use curlwp instead where appropriate.

patch reviewed by rmind


# 1.161 24-Jul-2014 rtr

split PRU_BIND and PRU_LISTEN function out of pr_generic() usrreq
switches and put into separate functions
xxx_bind(struct socket *, struct mbuf *)
xxx_listen(struct socket *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_BIND with call to
pr_bind()

- replace calls to pr_generic() with req = PRU_LISTEN with call to
pr_listen()

- drop struct lwp * parameter from at_pcbsetaddr(), in_pcbbind() and
unp_bind() and always use curlwp.

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_{bind,listen}() PRUs

- l2cap_{bind,listen}() -> l2cap_{bind,listen}_pcb()
- sco_{bind,listen}() -> sco_{bind,listen}_pcb()
- rfcomm_{bind,listen}() -> rfcomm_{bind,listen}_pcb()

patch reviewed by rmind

welcome to netbsd 6.99.48


# 1.160 23-Jul-2014 rtr

split PRU_SENDOOB and PRU_RCVOOB function out of pr_generic() usrreq
switches and put into separate functions
xxx_sendoob(struct socket *, struct mbuf *, struct mbuf *)
xxx_recvoob(struct socket *, struct mbuf *, int)

- always KASSERT(solocked(so)) even if request is not implemented

- replace calls to pr_generic() with req = PRU_{SEND,RCV}OOB with
calls to pr_{send,recv}oob() respectively.

there is still some tweaking of m_freem(m) and m_freem(control) to come
for consistency. not performed with this commit for clarity.

reviewed by rmind


# 1.159 09-Jul-2014 rtr

* split PRU_ACCEPT function out of pr_generic() usrreq switches and put
into a separate function xxx_accept(struct socket *, struct mbuf *)

note: future cleanup will take place to remove struct mbuf parameter
type and replace it with a more appropriate type.

patch reviewed by rmind


# 1.158 09-Jul-2014 rtr

* split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47


# 1.157 07-Jul-2014 rtr

* sprinkle KASSERT(solocked(so)); in all pr_stat() functions.
* fix remaining inconsistent struct socket parameter names.


# 1.156 06-Jul-2014 rtr

* split PRU_SENSE functionality out of xxx_usrreq() switches and place into
separate xxx_stat(struct socket *, struct stat *) functions.
* replace calls using pr_generic with req == PRU_SENSE with pr_stat().

further change will follow that cleans up the pattern used to extract the
pcb and test for its presence.

reviewed by rmind


# 1.155 01-Jul-2014 rtr

fix parameter types in pr_ioctl, called xx_control() functions and remove
abuse of pointer to struct mbuf type.

param2 changed to u_long type and uses parameter name 'cmd' (ioctl command)
param3 changed to void * type and uses parameter name 'data'
param4 changed to struct ifnet * and uses parameter name 'ifp'
param5 has been removed (formerly struct lwp *) and uses of 'l' have been
replaced with curlwp from curproc(9).

callers have had (now unnecessary) casts to struct mbuf * removed, called
code has had (now unnecessary) casts to u_long, void * and struct ifnet *
respectively removed.

reviewed by rmind@


# 1.154 22-Jun-2014 rtr

* split PRU_CONTROL functionality out of xxx_userreq() switches and place
into separate xxx_ioctl() functions.
* place KASSERT(req != PRU_CONTROL) inside xxx_userreq() as it is now
inappropriate for req = PRU_CONTROL in xxx_userreq().
* replace calls to pr_generic() with req = PRU_CONTROL with pr_ioctl().
* remove & fixup references to PRU_CONTROL xxx_userreq() function comments.
* fix various comments references for xxx_userreq() that mentioned
PRU_CONTROL as xxx_userreq() no longer handles the request.

a further change will follow to fix parameter and naming inconsistencies
retained from original code.

Reviewed by rmind@


# 1.153 08-Jun-2014 christos

Handle race where the server closed the socket between us 'connecting' and
sending data.


# 1.152 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.151 18-May-2014 rmind

Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.150 23-Jan-2014 hannken

branches: 1.150.2;
Change vnode operations create, mknod, mkdir and symlink to return
the resulting vnode *vpp unlocked.

Discussed on tech-kern@

Welcome to 6.99.30


# 1.149 17-Jan-2014 hannken

Change vnode operations create, mknod, mkdir and symlink to keep the
directory node dvp locked on return.

Discussed on tech-kern@

Welcome to 6.99.29


# 1.148 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.147 25-Oct-2013 martin

Eliminat an unused variable and simplify the KASSERT which used to use it


# 1.146 08-Oct-2013 christos

Centralize the sockaddr_un allocation code. Set sun_len appropriately so
that the address length returned is correct, not always 106. Note that
we do things slightly differently than linux and explain why. Unit-tests
to come.


# 1.145 08-Oct-2013 christos

- Instead of having accept(2) return a zero-filled sockaddr for the case
where accept(2) was called on a unix socket that called connect(2) and
then close(2), before the connection was accepted, return the empty
sockaddr_un.
- Fix the length of the empty sockaddr_un socket so that it reflects reality.


# 1.144 29-Aug-2013 rmind

Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.


# 1.143 01-Aug-2013 drochner

In unp_externalize, don't do anything if an SCM_RIGHTS control message
was sent with zero file descriptors in it. Otherwise, a zero-length
temporary storage would be allocated which triggers panic on DIAGNOSTIC
kernels (but is harmless for release kernels).
reviewed by Taylor R Campbell


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.142 27-Jun-2013 christos

branches: 1.142.2;
use sbcreatecontrol1() and m_add() instead of open-coding everything, and
getting it slightly wrong.


Revision tags: agc-symver-base
# 1.141 14-Feb-2013 riastradh

Fix some screw cases in cmsg file descriptor passing.

- Don't leave garbage in the control buffer if allocating file
descriptors fails in unp_externalize.

- Scrub the space between CMSG_LEN and CMSG_SPACE to avoid kernel
memory disclosure in unp_externalize.

- Don't read past cmsg_len when closing file descriptors that
couldn't get delivered, in free_rights.

ok christos


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.140 06-Oct-2012 christos

Avoid crash dereferencing a NULL fp in fd_affix() in unp_externalize
caused by the sequence of passing two fd's with two sendmsg()'s,
then doing a read() and a recvmsg(). The read() calls dom_dispose()
which discards both messages in the mbuf, and sets the fp's in the
array to NULL. Linux dequeues only one message per read() so the
second recvmsg() gets the fd from the second message. This fix
just avoids the NULL pointer de-reference, making the second
recvmsg() to fail. It is dubious to pass fd's with stream sockets
and expect mixing read() and recvmsg() to work. Plus processing
one control message per read() changes the current semantics and
should be examined before applied. In addition there is a race between
dom_externalize() and dom_dispose(): what happens in a multi-threaded
network stack when one thread disposes where the other externalizes
the same array?

NB: Pullup to 6.


# 1.139 30-Jul-2012 christos

branches: 1.139.2;
remove infinite loop on error, extra parens on return.


# 1.138 30-Jul-2012 christos

simplify unp_externalize(), some from gimpy, some from me.


# 1.137 02-Jun-2012 martin

Stopgap fix for PR kern/46463: disallow passing of kqueue descriptors
via SCM_RIGHT anxiliary socket messages.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.136 26-Jun-2011 christos

branches: 1.136.2; 1.136.8;
* Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).

- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.

* Fix the system calls that take socklen_t arguments to actually do so.

* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).

* Bump libc version for the new syscalls.


# 1.135 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.134 29-May-2011 manu

branches: 1.134.2;
Add SOCK_SEQPACKET to PL_LOCAL sockets. Based on patch from Jesse Off,
submitted 8 years ago:
http://mail-index.netbsd.org/tech-kern/2003/04/14/0006.html


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.133 19-Nov-2010 dholland

branches: 1.133.2;
Introduce struct pathbuf. This is an abstraction to hold a pathname
and the metadata required to interpret it. Callers of namei must now
create a pathbuf and pass it to NDINIT (instead of a string and a
uio_seg), then destroy the pathbuf after the namei session is
complete.

Update all namei call sites accordingly. Add a pathbuf(9) man page and
update namei(9).

The pathbuf interface also now appears in a couple of related
additional places that were passing string/uio_seg pairs that were
later fed into NDINIT. Update other call sites accordingly.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3
# 1.132 21-Oct-2010 yamt

unp_connect: fix an assertion


# 1.131 21-Oct-2010 yamt

unp_connect2: fix a comment.


Revision tags: yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.130 24-Jun-2010 hannken

Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.129 09-Feb-2010 wiz

branches: 1.129.2;
Fix typo in comment.


Revision tags: uebayasi-xip-base
# 1.128 08-Jan-2010 pooka

branches: 1.128.2;
The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change


Revision tags: matt-premerge-20091211 yamt-nfs-mp-base8 jym-xensuspend-nbase
# 1.127 26-Aug-2009 bouyer

In uipc_usrreq(PRU_ACCEPT), grab the unp_streamlock before unp_setpeerlocks().
This fixes a race where, for a short period of time, so->so_lock and
so2->so_lock are not sync. This makes solocked2() and solocked()
unreliable and cause DIAGNOSTIC kernel panics. This also fixes a possible
panic in unp_setaddr() which expects the socket locked.
Should fix kern/38968, fix proposed in
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005863.html


Revision tags: yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.126 24-May-2009 ad

More changes to improve kern_descrip.c.

- Avoid atomics in more places.
- Remove the per-descriptor mutex, and just use filedesc_t::fd_lock.
It was only being used to synchronize close, and in any case we needed
to take fd_lock to free the descriptor slot.
- Optimize certain paths for the <NDFDFILE case.
- Sprinkle more comments and assertions.
- Cache more stuff in filedesc_t.
- Fix numerous minor bugs spotted along the way.
- Restructure how the open files array is maintained, for clarity and so
that we can eliminate the membar_consumer() call in fd_getfile(). This is
mostly syntactic sugar; the main functional change is that fd_nfiles now
lives alongside the open file array.

Some measurements with libmicro:

- simple file syscalls are like close() are between 1 to 10% faster.
- some nice improvements, e.g. poll(1000) which is ~50% faster.


Revision tags: yamt-nfs-mp-base4 yamt-nfs-mp-base3 jym-xensuspend-base
# 1.125 04-May-2009 yamt

tweak some assertions on so_head to make them more meaningful.


Revision tags: nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.124 09-Apr-2009 yamt

0 -> NULL


# 1.123 09-Apr-2009 yamt

remove an unnecessary cast.


# 1.122 09-Apr-2009 yamt

0 -> NULL where appropriate


# 1.121 11-Mar-2009 mrg

completely rework the way that orphaned sockets that are being fdpassed
via SCM_RIGHTS messages are dealt with:

1. unp_gc: make this a kthread.

2. unp_detach: go not call unp_gc directly. instead, wake up unp_gc kthread.

3. unp_scan: do not close files here. instead, put them on a global list
for unp_gc to close, along with a per-file "deferred close count". if
file is already enqueued for close, just increment deferred close count.
this eliminates the recursive calls.

3. unp_gc: scan files on global deferred close list. close each file N
times, as specified by deferred close count in file. continue processing
list until it becomes empty (closing may cause additional files to be
queued for close).

4. unp_gc: add additional bit to mark files we are scanning. set during
initial scan of global file list that currently clears FMARK/FDEFER.
during later scans, never examine / garbage collect descriptors that
we have not marked during the earlier scan. do not proceed with this
initial scan until all deferred closes have been processed. be careful
with locking to ensure no races are introduced between deferred close
and file scan.

5. unp_gc: use dummy file_t to mark position in list when scanning. allow
us to drop filelist_lock. in turn allows us to eliminate kmem_alloc()
and safely close files, etc.

6. prohibit transfer of descriptors within SCM_RIGHTS messages if
(num_files_in_transit > maxfiles / unp_rights_ratio)

7. fd_allocfile: ensure recycled filse don't get scanned.


this is 97% work done by andrew doran, with a couple of minor bug fixes
and a lot of testing by yours truly.


Revision tags: nick-hppapmap-base2
# 1.120 08-Feb-2009 pooka

branches: 1.120.2;
Don't try to fd_putfile() descriptors we didn't manage to fd_getfile().

Fixes local DoS panic described in kern/40570.


Revision tags: netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.119 11-Oct-2008 pooka

branches: 1.119.2; 1.119.4;
Move uidinfo to its own module in kern_uidinfo.c and include in rump.
No functional change to uidinfo.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.118 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.117 20-Jun-2008 christos

branches: 1.117.2;
Also enforce that cm->cmsg_len >= CMSG_ALIGN(sizeof cmsghdr), from
Michael van Elst


# 1.116 20-Jun-2008 christos

Don't require cm->cmsg_len == control->m_len, just that the cm->cmsg_len
<= control->m_len, like FreeBSD does. Idea from Taylor R Campbell.


Revision tags: yamt-pf42-base4
# 1.115 10-Jun-2008 ad

There can be existing waiters on a socket's condition variables when we
change socket::so_lock, and they rely on the old lock to synchronize.
Wake them up whenever we change so_lock so they can restart their waits.


Revision tags: yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.114 28-Apr-2008 martin

branches: 1.114.2; 1.114.4;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.113 27-Apr-2008 ad

branches: 1.113.2;
Add a comment.


# 1.112 24-Apr-2008 ad

Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.111 20-Apr-2008 mlelstv

When unp_internalize fails (due to the sanity check or an out-of-memory
condition), it leaves the control message with file descriptors. Calling
unp_dispose() will interpret the message as containing file pointers
and crash the system.
This change removes unp_dispose() from this failure path and avoids
using goto to jump into switch statements...
The previous workaround to ignore such messages in unp_scan() is removed.


# 1.110 19-Apr-2008 mjf

If cm->cmsg_len is not valid for unp_internalize do not use it to work out
where the data is in unp_scan.

Fixes PR/38391


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.109 28-Mar-2008 ad

branches: 1.109.2;
Prevent overlapping calls to bind() and/or connect() on a Unix socket.


Revision tags: ad-socklock-base1
# 1.108 24-Mar-2008 yamt

merge yamt-lazymbuf branch.


Revision tags: yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.107 21-Mar-2008 rmind

unp_gc: unlock filelist_lock in a case of restart.


# 1.106 21-Mar-2008 ad

Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.105 25-Jan-2008 ad

branches: 1.105.6;
Remove VOP_LEASE. Discussed on tech-kern.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.104 05-Jan-2008 dsl

Use FILE_LOCK() and FILE_UNLOCK()


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base jmcneill-pm-base
# 1.103 08-Dec-2007 pooka

branches: 1.103.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.


Revision tags: vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase reinoud-bufcleanup-base
# 1.102 26-Nov-2007 pooka

branches: 1.102.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern


Revision tags: jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 vmlocking-base
# 1.101 08-Oct-2007 ad

branches: 1.101.4;
Merge file descriptor locking, cwdi locking and cross-call changes
from the vmlocking branch.


Revision tags: yamt-x86pmap-base2 yamt-x86pmap-base
# 1.100 19-Sep-2007 dyoung

branches: 1.100.2;
Make uipc_ctloutput() return ENOPROTOOPT instead of EINVAL when it
is passed a handle socket-option level that it does not care about.


Revision tags: nick-csl-alignment-base5
# 1.99 09-Aug-2007 he

branches: 1.99.2;
Add a new socket option for unix domain sockets: LOCAL_PEEREID, to make
it possible to get the pid, euid and egid of the process at the remote
end at the time it did bind() or connect().

Add a new libc function, getpeereid() to easily get at the euid and egid.
As a consequence, bump libc's minor number.

Document the LOCAL_PEEREID socket option in unix(4).

Based on contribution by Arne H. Juul, minor modifications by myself.


Revision tags: matt-mips64-base
# 1.98 03-Aug-2007 martin

branches: 1.98.2;
PR kern/32842:
do not leak file descriptors when sending a datagram with SCM_RIGHTS
fails. Patch from Gary Thorpe, based on changes in FreeBSD and work
from Christian Biere.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.97 22-Apr-2007 dsl

branches: 1.97.2; 1.97.6;
Change the way that emulations locate files within the emulation root to
avoid having to allocate space in the 'stackgap'
- which is very LWP unfriendly.
The additional code for non-emulation namei() is trivial, the reduction for
the emulations is massive.
The vnode for a processes emulation root is saved in the cwdi structure
during process exec.
If the emulation root the TRYEMULROOT flag are set, namei() will do an initial
search for absolute pathnames in the emulation root, if that fails it will
retry from the normal root.
".." at the emulation root will always go to the real root, even in the middle
of paths and when expanding symlinks.
Absolute symlinks found using absolute paths in the emulation root will be
relative to the emulation root (so /usr/lib/xxx.so -> /lib/xxx.so links
inside the emulation root don't need changing).
If the root of the emulation would be returned (for an emulation lookup), then
the real root is returned instead (matching the behaviour of emul_lookup,
but being a cheap comparison here) so that programs that scan "../.."
looking for the root dircetory don't loop forever.
The target for symbolic links is no longer mangled (it used to get the
CHECK_ALT_xxx() treatment, so could get /emul/xxx prepended).
CHECK_ALT_xxx() are no more. Most of the change is deleting them, and adding
TRYEMULROOT to the flags to NDINIT().
A lot of the emulation system call stubs could now be deleted.


Revision tags: thorpej-atomic-base
# 1.96 03-Apr-2007 hannken

Remove calls to now obsolete vn_start_write() and vn_finished_write().


# 1.95 04-Mar-2007 christos

branches: 1.95.2; 1.95.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.94 01-Nov-2006 cbiere

branches: 1.94.2; 1.94.4; 1.94.8;
Pointing one element past an array is fine, pointing before it isn't.


Revision tags: yamt-splraiseipl-base2 yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.93 03-Sep-2006 christos

branches: 1.93.2; 1.93.4;
use c99 initializers


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base8 yamt-pdpolicy-base7
# 1.92 23-Jul-2006 ad

Use the LWP cached credentials where sane.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.91 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.90 14-Apr-2006 christos

Coverity CID 1089: Add more KASSERTs to prevent NULL deref.


# 1.89 14-Apr-2006 christos

Coverity CID 1088: Add KASSERT to prevent NULL pointer deref.


# 1.88 13-Apr-2006 matt

Add a KASSERT to document a condition for the PRU_ABORT case.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.87 01-Mar-2006 christos

branches: 1.87.2; 1.87.4; 1.87.6;
PR/32856: Christian Biere: Don't panic if you send a control message with
SCM_RIGHTS on an unconnected stream socket.


# 1.86 11-Dec-2005 christos

branches: 1.86.2; 1.86.4; 1.86.6;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base ktrace-lwp-base
# 1.85 11-Nov-2005 simonb

Call nanotime() directly, instead of doing the
microtime()/TIMEVAL_TO_TIMESPEC() dance.


Revision tags: yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.84 30-Aug-2005 jmmv

Honor the user's umask while creating local sockets. Several other systems
do already this (such as FreeBSD, OpenBSD and Linux), so it will improve
portability of some third-party programs. No objections in tech-kern@.


# 1.83 16-Jun-2005 yamt

branches: 1.83.2;
uipc_usrreq: plug mbuf leak.


# 1.82 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


# 1.81 07-May-2005 christos

PR/30154: YAMAMOTO Takashi: tcp_close locking botch
chgsbsize() as mentioned in the PR can be called from an interrupt context
via tcp_close(). Avoid calling uid_find() in chgsbsize().
- Instead of storing so_uid in struct socketvar, store *so_uidinfo
- Add a simple lock to struct uidinfo.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.80 26-Feb-2005 perry

branches: 1.80.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.79 03-Sep-2004 darrenr

branches: 1.79.4; 1.79.6;
add a per-socket counter for dropped UDP packets when the internal buffers
are full.


# 1.78 22-May-2004 jonathan

Eliminate several uses of `curproc' from the socket-layer code and from NFS.

Add a new explicit `struct proc *p' argument to socreate(), sosend().
Use that argument instead of curproc. Follow-on changes to pass that
argument to socreate(), sosend(), and (*so->so_send)() calls.
These changes reviewed and independently recoded by Matt Thomas.

Changes to soreceive() and (*dom->dom_exernalize() from Matt Thomas:
pass soreceive()'s struct uio* uio->uio_procp to unp_externalize().
Eliminate curproc from unp_externalize. Also, now soreceive() uses
its uio->uio_procp value, pass that same value downward to
((pr->pru_usrreq)() calls for consistency, instead of (struct proc * )0.

Similar changes in sys/nfs to eliminate (most) uses of curproc,
either via the req-> r_procp field of a struct nfsreq *req argument,
or by passing down new explicit struct proc * arguments.

Reviewed by: Matt Thomas, posted to tech-kern.
NB: The (*pr->pru_usrreq)() change should be tested on more (all!) protocols.


# 1.77 18-Apr-2004 matt

Constify sun_noname.


# 1.76 18-Apr-2004 matt

ANSI'fy.


# 1.75 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.74 23-Mar-2004 junyoung

branches: 1.74.2; 1.74.4;
Nuke __P().


# 1.73 29-Dec-2003 martin

Avoid using m_clget() on a mbuf already in use, especially when we
need the data in the mbuf later and m_clget() changes some fields
overlaid to regular mbuf data. Instead, rearange code a bit, create
data into a new allocated buffer and and use MEXTADD to attach it to
the mbuf, if the mbuf internal space is not sufficient.

This fixes a crash on sparc64 (and probably all other archs where
sizeof(int) != sizeof(struct file *)) when running
regress/sys/kern/unfdpass.

Idea for solution from Matt Thomas, with additional input from YAMAMOTO
Takashi.


# 1.72 29-Nov-2003 matt

Restore a change that made AF_LOCAL sockets block on connect(2) until
accepted. However, this time this behavor is not the default. Instead
it must enabled by using the LOCAL_CONNWAIT socket option on either the
connecting or accepting socket.


# 1.71 29-Nov-2003 perry

Revert a change that altered the semantics of AF_LOCAL sockets. Sadly
this made us API incompatible with other Unixes.


# 1.70 15-Oct-2003 hannken

Add the gating of system calls that cause modifications to the underlying
file system.
The function vfs_write_suspend stops all new write operations to a file
system, allows any file system modifying system calls already in progress
to complete, then sync's the file system to disk and returns. The
function vfs_write_resume allows the suspended write operations to
complete.

From FreeBSD with slight modifications.

Approved by: Frank van der Linden <fvdl@netbsd.org>


# 1.69 03-Sep-2003 matt

Fix typo.


# 1.68 03-Sep-2003 matt

Change the behavor of AF_LOCAL connect() to sleep until the server has
accepted the connection. This can prevent a client from overwhelming a
server.


# 1.67 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.66 24-Jul-2003 jdolecek

back rev 1.63 (the linux hack) off - no compat specific code
in generic code, please

we need to massage the passed linux cmsg anyway, linux uses different
alignment for CMSG_DATA on at least some architectures


# 1.65 23-Jul-2003 itojun

backout previous, there was a comment on LINUX_SOL_SOCKET=1


# 1.64 23-Jul-2003 itojun

#define LINUX_SOL_SOCKET 1, so that we can answer "what the hell is this 1?"
at ease.


# 1.63 23-Jul-2003 christos

From Todd Vierling: Accept level == 1 for linux compat.


# 1.62 29-Jun-2003 fvdl

branches: 1.62.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.61 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.60 10-Apr-2003 christos

RP/21088: Jesse Off: Return ENOBUFS instead of EINVAL when sbappend fails.


# 1.59 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


# 1.58 25-Feb-2003 pk

Fix a simple_lock() mismatch in unp_internalize().
We may need to merge the passes over the files contained in the message
as noted by enami tsugutomo on tech-smp.


# 1.57 23-Feb-2003 pk

Make updating a file's reference and use count MP-safe.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.56 25-Nov-2002 itojun

branches: 1.56.2;
no need for error check after MEXTMALLOC - jdolecek


# 1.55 25-Nov-2002 itojun

MEXTMALLOC() can fail even if M_WAITOK, if arg is too big for malloc().


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.54 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base gehenna-devsw-base eeh-devprop-base newlock-base ifpoll-base thorpej-mips-cache-base
# 1.53 12-Nov-2001 lukem

add RCSIDs


# 1.52 18-Oct-2001 thorpej

branches: 1.52.2;
Deprecate the "m_act" alias of "m_nextpkt" (m_act is a historical
name), and just use m_nextpkt everywhere.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.51 14-Jun-2001 thorpej

branches: 1.51.2;
Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.


# 1.50 07-Jun-2001 thorpej

Rework fdalloc() even further: split fdalloc() into fdalloc() and
fdexpand(). The former will return ENOSPC if there is not space
in the current filedesc table. The latter performs the expansion
of the filedesc table. This means that fdalloc() won't ever block,
and it gives callers an opportunity to clean up before the
potentially-blocking fdexpand() call.

Update all fdalloc() callers to deal with the need-to-fdexpand() case.

Rewrite unp_externalize() to use fdalloc() and fdexpand() in a
safe way, using an algorithm suggested by Bill Sommerfeld:
- Use a temporary array of integers to hold the new filedesc table
indexes. This allows us to repeat the loop if necessary.
- Loop through the array of file *'s, assigning them to filedesc table
slots. If fdalloc() indicates expansion is necessary, undo the
assignments we've done so far, expand, and retry the whole process.
- Once all file *'s have been assigned to slots, update the f_msgcount
and unp_rights counters.
- Right before we return, copy the temporary integer array to the message
buffer, and trim the length as before.
Note that once locking is added to the filedesc array, this entire
operation will be `atomic', in that the lock will be held while
file *'s are assigned to embryonic table slots, thus preventing anything
else from using them.


# 1.49 06-Jun-2001 thorpej

Change fdalloc() to return ERESTART if we had to reallocate the
descriptor array, which may have blocked. Change callers of
fdalloc() to restart whatever they\'re doing if this condition
happens. (XXX unp_externalize() needs some work, but that will
be tackled later.)

Change finishdup() to close the descriptor in the `new\' slot if
one exists, and change sys_dup2() accordingly.

Closes a race condition when using kernel-assisted user threads.

While here, garbage-collect UF_MAPPED -- it is not used anywhere.


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base thorpej_scsipi_base
# 1.48 05-Jun-2000 thorpej

branches: 1.48.2; 1.48.4;
Oops, missed a couple of places where CMSG_*() should be used. No
functional change in this case, but the code is now correct.


# 1.47 05-Jun-2000 thorpej

- Fix file descriptor passing AGAIN. This has apparently been broken
on LP64 systems (and probably the SPARC) since the __cmsg_alignbytes()
changes went in.
- Change file descriptor passing to use CMSG_DATA(), not (cm + 1). This
pretty much has to be done in order to make it work properly on LP64,
and considering that it's been broken this long...
- Use CMSG_SPACE() to determine the mbuf length needed for a given
control message, and CMSG_LEN() to stash in the cmsg_len member.


Revision tags: minoura-xpg4dl-base
# 1.46 30-Mar-2000 augustss

branches: 1.46.2;
Get rid of register declarations.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base chs-ubc2-base
# 1.45 17-Jun-1999 thorpej

branches: 1.45.2;
Um, hi, let's initialize pointers before we use them.


# 1.44 05-May-1999 thorpej

Add "use counting" to file entries. When closing a file, and it's reference
count is 0, wait for use count to drain before finishing the close.

This is necessary in order for multiple processes to safely share file
descriptor tables.


# 1.43 05-May-1999 thorpej

Fix alignment problem in the garbage-collection code path.


# 1.42 30-Apr-1999 thorpej

Break cdir/rdir/cmask info out of struct filedesc, and put it in a new
substructure, `cwdinfo'. Implement optional sharing of this substructure.

This is required for clone(2).


# 1.41 21-Apr-1999 mrg

revert previous. oops.


# 1.40 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: netbsd-1-4-base
# 1.39 22-Mar-1999 sommerfe

branches: 1.39.2;
Disallow descriptor-passing of descriptors which are open on
directories which aren't under the recipient's root.

Clean up of many error conditions involving descriptor passing, to
eliminate infinite loops, panics, premature garbage collection of
sockets, and descriptor leaks:
- Avoid letting unp_gc() see descriptors with a refcount of zero by
removing them from the socket's queue before releasing them.
- Avoid socket leak in PRU_ABORT (this will also gc descriptors queued
on a not-yet accepted socket when the accepting socket goes away).
- Put in block comment explaining how unp_gc() should work.
- Correctly manage unp_defer count so we don't get stuck in an infinite
loop with nothing to do.
- Don't tie MARK and DEFER bits so closely together.
- Mark descriptors queued on not-yet-accepted sockets as well.
- Don't call sorflush on non-socket, it doesn't work very well.
- Deal with discard of NULL file pointer.
- Hopefully cause GC to converge faster by only deferring sockets in
unp_mark().


# 1.38 21-Dec-1998 thorpej

In unp_internalize(), add a comment explaining why we must ALIGN() the
data after the cmsghdr when accessing internalized SCM_RIGHTS messages
(i.e. array of struct file *s). The historic interface does not align
the externalized SCM_RIGHTS messages (i.e. array of ints).


# 1.37 21-Dec-1998 thorpej

Fix a fencepost error in unp_scan() which caused a bad pointer deref on
the SPARC platform only (ILP32 but ALIGNBYTES of 7), due to a missing
ALIGN().


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.36 04-Aug-1998 perry

Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)


# 1.35 31-Jul-1998 perry

fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.


Revision tags: eeh-paddr_t-base
# 1.34 18-Jul-1998 lukem

branches: 1.34.2;
use AF_LOCAL instead of AF_UNIX


# 1.33 16-Jul-1998 thorpej

Back out previous, I botched something.


# 1.32 10-Jul-1998 thorpej

For SOCK_STREAM, provide the socket credentials to the accepter as soon as
the client connects.


# 1.31 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.30 07-Jan-1998 thorpej

Implement passing credentials as ancillary data on Unix domain sockets,
enabled with the LOCAL_CREDS socket option on the listener. Semantics are
similar to BSD/OS's:
- Creds are available with first data on SOCK_STREAM, and with every datagram
on SOCK_DGRAM.
- It is not possible to forge credentials.

Different in that:
- Different credential data structure (ours does not rely on the format
of internal kernel data structures, and does not pass the login name).
- We can pass creds and file descriptors at the same time (this does not
work in BSD/OS).

Luke Mewburn <lukem@netbsd.org> gets credit for inspiring me to implement
this. :-)


# 1.29 07-Jan-1998 thorpej

Fix passing of multiple file descriptors (was broken when code was made
64-bit safe).


Revision tags: netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.28 17-Oct-1997 christos

branches: 1.28.2;
PR/4280: Chris Jones: Sending more than one fd over AF_UNIX sockets causes
panic. Bug in the fd -> struct file * conversion...


Revision tags: thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.27 26-Jun-1997 thorpej

Several small changes to eliminate kludginess in dealing with unix domain
socket names:
- In unp_setsockaddr() and unp_setpeeraddr(), if the socket name can't
fit into a single mbuf, allocate enough external storage space to
hold it.
- In unp_bind() and unp_connect(), perform a similar operation, but allocate
one extra byte, and ensure that the pathname is nul-terminated.
Many thanks to enami tsugutomo <enami@cv.sony.co.jp> for the sanity
checking.


# 1.26 24-Jun-1997 thorpej

Eliminate use of dtom() in the handing of UNIX domain sockets. Add an
"unp_addrlen" member to the unpcb, and use it when copying the socket
name. This eliminates that last uses of dtom() in the system.


# 1.25 15-May-1997 kleink

When fstat(2)ing a file descriptor of a local communications domain socket,
fill the socket's creation time into the stat structure's st_[acm]time fields:
POSIX requires this behavior for pipe(2). N.B.: updating the st_[am]time fields
when reading/writing the pipe is neither required nor implemented, though.


# 1.24 10-Apr-1997 cgd

Internalize and externalize file descriptors being passed via local domain
socket control messages correctly, without assuming that sizeof(int) ==
sizeof(pointer). Fixes PR#3183.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.23 23-May-1996 mycroft

Oops. Add missing label.


# 1.22 23-May-1996 mycroft

We can only get a control mbuf for PRU_SEND or PRU_SENDOOB. Add diagnostic
code to panic in this case.


# 1.21 23-May-1996 mycroft

Make sure the control and data mbufs are freed in all cases.


# 1.20 23-May-1996 mycroft

Separate some code into separate functions.
Make unp_addr be a pointer to the sockaddr, not to the mbuf, as with raw
sockets.
Other minor cleanup.


# 1.19 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.18 09-Feb-1996 christos

branches: 1.18.4;
More proto fixes


# 1.17 04-Feb-1996 pk

unp_detach() return type botch.


# 1.16 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.15 17-Aug-1995 mycroft

so_pcb should be a void *.


# 1.14 16-Aug-1995 mycroft

Allocate PCBs with malloc(), more mgetclr(). Be more careful to free the
PCB after it's done with.


# 1.13 05-Apr-1995 mycroft

Add missing argument to closef().


# 1.12 13-Dec-1994 mycroft

LEASE_CHECK -> VOP_LEASE


# 1.11 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.10 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.9 08-Jun-1994 mycroft

Update to 4.4-Lite fs code.


# 1.8 04-May-1994 mycroft

Fix panic when closing a file descriptor on which access rights have been sent
but not received.


# 1.7 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.6 14-Sep-1993 mycroft

Fix from Noriyuki Soda <soda@sra.co.jp>:
recvmsg(2) always returns -1 with errno==EMSGSIZE, when trying
to pass file descriptors through UNIX domain socket.


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.5 27-Jun-1993 andrew

branches: 1.5.4;
ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.4 12-Jun-1993 andrew

Yuval Yarom's 'panic:closef: count < 0' fix to unp_discard().


# 1.3 22-May-1993 cgd

add include of select.h if necessary for protos, or delete if extraneous


# 1.2 18-May-1993 cgd

make kernel select interface be one-stop shopping & clean it all up.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.197 23-Feb-2020 ad

Merge from ad-namecache:

- Have a stab at clustering the members of vnode_t and vnode_impl_t in a
more cache-conscious way. With that done, go back to adjusting v_usecount
with atomics and keep vi_lock directly in vnode_impl_t (saves KVA).

- Allow VOP_LOCK(LK_NONE) for the benefit of VFS_VGET() and VFS_ROOT().
Make sure LK_UPGRADE always comes with LK_NOWAIT.

- Make cwdinfo use mostly lockless.


# 1.196 01-Feb-2020 riastradh

Load struct fdfile::ff_file with atomic_load_consume.

Exceptions: when we're only testing whether it's there, not about to
dereference it.

Note: We do not use atomic_store_release to set it because the
preceding mutex_exit should be enough.

(That said, it's not clear the mutex_enter/exit is needed unless
refcnt > 0 already, in which case maybe it would be a win to switch
from the membar implied by mutex_enter to the membar implied by
atomic_store_release -- which I would generally expect to be much
cheaper. And a little clearer without a long comment.)


# 1.195 01-Feb-2020 riastradh

Load struct filedesc::fd_dt with atomic_load_consume.

Exceptions: when fd_refcnt <= 1, or when holding fd_lock.

While here:

- Restore KASSERT(mutex_owned(&fdp->fd_lock)) in fd_unused.
=> This is used only in fd_close and fd_abort, where it holds.
- Move bounds check assertion in fd_putfile to where it matters.
- Store fd_dt with atomic_store_release.
- Move load of fd_dt under lock in knote_fdclose.
- Omit membar_consumer in fdesc_readdir.
=> atomic_load_consume serves the same purpose now.
=> Was needed only on alpha anyway.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 ad-namecache-base2 ad-namecache-base1 ad-namecache-base netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base
# 1.194 29-Jul-2019 maxv

branches: 1.194.4;
Fix info leak: the padding after the header causes uninitialized heap
memory to be copied to userland in sys_recvmsg().


Revision tags: phil-wifi-20190609
# 1.193 03-Jun-2019 msaitoh

Fix typo in comment (s/seperate/separate/).


Revision tags: isaki-audio2-base
# 1.192 01-Mar-2019 pgoyette

Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.

NFCI intended.

Ride the earlier kernel bump - it;s getting crowded.


# 1.191 20-Feb-2019 pgoyette

compat70_ocreds_valid is not a pointer to a boolean, it is the boolean
itself which controls whether or not we recognize the OCRED options.

Should fix the panic identified in PR kern/53991 (awaiting confirmation
from submitter).


# 1.190 04-Feb-2019 mrg

add or adjust fallthru comments.


# 1.189 29-Jan-2019 pgoyette

Normalize all the compat hooks' names to the form

<subsystem>_<function>_<version>_hook

NFCI

XXX Note that although this introduces a change in the kernel-to-
XXX module interface, we are NOT bumping the kernel version number.
XXX We will bump the version number once the interface stabilizes.


# 1.188 27-Jan-2019 pgoyette

Merge the [pgoyette-compat] branch


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.187 08-Nov-2018 roy

Don't call soroverflow when we return the error to the sender.

Thanks to thorpej@ for a sanity check.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.186 11-May-2018 roy

branches: 1.186.2;
Increase the default size of some receive buffers from 8k to 16k.
This mitigates recent reports of socket overflow errors
and fixes PR bin/53247.


# 1.185 05-May-2018 christos

bump PIPSIZ from 4 to 8K like FreeBSD and provide the same sysctls


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.184 19-Mar-2018 roy

socket: report receive buffer overflows

Add soroverflow() which increments the overflow counter, sets so_error
to ENOBUFS and wakes the receive socket up.
Replace all code that manually increments this counter with soroverflow().
Add soroverflow() to raw_input().

This allows userland to detect route(4) overflows so it can re-sync
with the current state.


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.183 17-Feb-2018 christos

branches: 1.183.2;
fix LOCAL_PEEREID to not return the same info for both sides...
XXX: pullup-{7,8}


Revision tags: tls-maxphys-base-20171202
# 1.182 02-Dec-2017 mrg

include opt_compat_netbsd.h, so that eg COMPAT_70 will be set.


Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.181 31-Oct-2016 maxv

branches: 1.181.8;
Memory leak, found by Mootja. It is easily triggerable from userland.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.180 06-Apr-2016 roy

branches: 1.180.2;
Add sc_pid to sockcred so that SOCK_DGRAM and LOCAL_CREDS socket option
can learn the process id of the AF_LOCAL sender.
Add compat glue for old structure.


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606
# 1.179 02-May-2015 rtr

make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}


# 1.178 26-Apr-2015 rtr

remove pr_generic from struct pr_usrreqs and all implementations of
pr_generic in protocols.

bump to 7.99.13

approved by rmind@


# 1.177 24-Apr-2015 rtr

make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19


Revision tags: nick-nhusb-base-20150406
# 1.176 03-Apr-2015 rtr

* change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@


# 1.175 01-Mar-2015 christos

PR/39918: ITOH Yasufumi: Replace KASSERT with continue, since the file
descriptor can be closed since closef() does not pay attention to FDEFER.
XXX: Pullup-7


# 1.174 28-Feb-2015 rtr

Place opening brace to function at column 0, like in the rest of the file.


# 1.173 02-Feb-2015 christos

Handle LOCAL_PEERID for socketpair() connected sockets which connect through
connect2().
1. move the code that sets the peerid structure into connect1(). This
handles so2. The datagram code calls connect2 twice with flipped
so arguments so both sockets get set.
2. in connect2 copy the peerid structure from so2 to so, so that that
both stream sockets get set.


Revision tags: nick-nhusb-base
# 1.172 08-Oct-2014 taca

branches: 1.172.2;
Make behavior of getsockname(2) (and maybe getpeername(2)) as the same as
NetBSD 6.1_STABLE and other operating system (OS X 10.9.5).

* sa_len of sockaddr_un strucrure is always set to sizeof(sun_path).
* pathname stored in sun_path is alwasys '\0' terminated (except length
of sun_path is sizeof(sun_path)?).

Should be fix PR kern/49247, runtime problem of lmtp service of dovecot2 on
NetBSD current and NetBSD 7.0_BETA.


# 1.171 05-Sep-2014 matt

Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.


# 1.170 05-Sep-2014 matt

Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 09-Aug-2014 rtr

branches: 1.169.2;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@


# 1.168 08-Aug-2014 rtr

split PRU_RCVD function out of pr_generic() usrreq switches and put into
separate functions

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_RCVD with calls to
pr_rcvd()


# 1.167 05-Aug-2014 rtr

actually use the passed in struct lwp *l instead of curlwp in unp_connect()


# 1.166 05-Aug-2014 rtr

split PRU_SEND function out of pr_generic() usrreq switches and put into
separate functions

xxx_send(struct socket *, struct mbuf *, struct mbuf *,
struct mbuf *, struct lwp *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_SEND with calls to
pr_send()

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_send() PRUs

- l2cap_send() -> l2cap_send_pcb()
- sco_send() -> sco_send_pcb()
- rfcomm_send() -> rfcomm_send_pcb()

patch reviewed by rmind


# 1.165 05-Aug-2014 rtr

revert the removal of struct lwp * parameter from bind, listen and connect
user requests.

this should resolve the issue relating to nfs client hangs presented
recently by wiz on current-users@


# 1.164 31-Jul-2014 rtr

* remove declarations of unp_bind, unp_discard, unp_disconnect1, unp_drop,
unp_shutdown1, unp_internalize and unp_output functions from sys/un.h
and instead declare them as static in uipc_usrreq.c with prototype
declarations as necessary.

* remove struct lwp * parameter from unp_output() while here and just
use curlwp instead.

as discussed with rmind


# 1.163 31-Jul-2014 rtr

split PRU_DISCONNECT, PRU_SHUTDOWN and PRU_ABORT function out of
pr_generic() usrreq switches and put into separate functions

xxx_disconnect(struct socket *)
xxx_shutdown(struct socket *)
xxx_abort(struct socket *)

- always KASSERT(solocked(so)) even if not implemented
- replace calls to pr_generic() with req =
PRU_{DISCONNECT,SHUTDOWN,ABORT}
with calls to pr_{disconnect,shutdown,abort}() respectively

rename existing internal functions used to implement above functionality
to permit use of the names for xxx_{disconnect,shutdown,abort}().

- {l2cap,sco,rfcomm}_disconnect() ->
{l2cap,sco,rfcomm}_disconnect_pcb()
- {unp,rip,tcp}_disconnect() -> {unp,rip,tcp}_disconnect1()
- unp_shutdown() -> unp_shutdown1()

patch reviewed by rmind


# 1.162 30-Jul-2014 rtr

split PRU_CONNECT function out of pr_generic() usrreq switches and put
into seaparate functions

xxx_listen(struct socket *, struct mbuf *)

- always KASSERT(solocked(so)) and KASSERT(nam != NULL)
- replace calls to pr_generic() with req = PRU_CONNECT with
pr_connect()
- rename existin {l2cap,sco,rfcomm}_connect() to
{l2cap,sco,rfcomm}_connect_pcb() respectively to permit
naming consistency with other protocols functions.
- drop struct lwp * parameter from unp_connect() and at_pcbconnect()
and use curlwp instead where appropriate.

patch reviewed by rmind


# 1.161 24-Jul-2014 rtr

split PRU_BIND and PRU_LISTEN function out of pr_generic() usrreq
switches and put into separate functions
xxx_bind(struct socket *, struct mbuf *)
xxx_listen(struct socket *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_BIND with call to
pr_bind()

- replace calls to pr_generic() with req = PRU_LISTEN with call to
pr_listen()

- drop struct lwp * parameter from at_pcbsetaddr(), in_pcbbind() and
unp_bind() and always use curlwp.

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_{bind,listen}() PRUs

- l2cap_{bind,listen}() -> l2cap_{bind,listen}_pcb()
- sco_{bind,listen}() -> sco_{bind,listen}_pcb()
- rfcomm_{bind,listen}() -> rfcomm_{bind,listen}_pcb()

patch reviewed by rmind

welcome to netbsd 6.99.48


# 1.160 23-Jul-2014 rtr

split PRU_SENDOOB and PRU_RCVOOB function out of pr_generic() usrreq
switches and put into separate functions
xxx_sendoob(struct socket *, struct mbuf *, struct mbuf *)
xxx_recvoob(struct socket *, struct mbuf *, int)

- always KASSERT(solocked(so)) even if request is not implemented

- replace calls to pr_generic() with req = PRU_{SEND,RCV}OOB with
calls to pr_{send,recv}oob() respectively.

there is still some tweaking of m_freem(m) and m_freem(control) to come
for consistency. not performed with this commit for clarity.

reviewed by rmind


# 1.159 09-Jul-2014 rtr

* split PRU_ACCEPT function out of pr_generic() usrreq switches and put
into a separate function xxx_accept(struct socket *, struct mbuf *)

note: future cleanup will take place to remove struct mbuf parameter
type and replace it with a more appropriate type.

patch reviewed by rmind


# 1.158 09-Jul-2014 rtr

* split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47


# 1.157 07-Jul-2014 rtr

* sprinkle KASSERT(solocked(so)); in all pr_stat() functions.
* fix remaining inconsistent struct socket parameter names.


# 1.156 06-Jul-2014 rtr

* split PRU_SENSE functionality out of xxx_usrreq() switches and place into
separate xxx_stat(struct socket *, struct stat *) functions.
* replace calls using pr_generic with req == PRU_SENSE with pr_stat().

further change will follow that cleans up the pattern used to extract the
pcb and test for its presence.

reviewed by rmind


# 1.155 01-Jul-2014 rtr

fix parameter types in pr_ioctl, called xx_control() functions and remove
abuse of pointer to struct mbuf type.

param2 changed to u_long type and uses parameter name 'cmd' (ioctl command)
param3 changed to void * type and uses parameter name 'data'
param4 changed to struct ifnet * and uses parameter name 'ifp'
param5 has been removed (formerly struct lwp *) and uses of 'l' have been
replaced with curlwp from curproc(9).

callers have had (now unnecessary) casts to struct mbuf * removed, called
code has had (now unnecessary) casts to u_long, void * and struct ifnet *
respectively removed.

reviewed by rmind@


# 1.154 22-Jun-2014 rtr

* split PRU_CONTROL functionality out of xxx_userreq() switches and place
into separate xxx_ioctl() functions.
* place KASSERT(req != PRU_CONTROL) inside xxx_userreq() as it is now
inappropriate for req = PRU_CONTROL in xxx_userreq().
* replace calls to pr_generic() with req = PRU_CONTROL with pr_ioctl().
* remove & fixup references to PRU_CONTROL xxx_userreq() function comments.
* fix various comments references for xxx_userreq() that mentioned
PRU_CONTROL as xxx_userreq() no longer handles the request.

a further change will follow to fix parameter and naming inconsistencies
retained from original code.

Reviewed by rmind@


# 1.153 08-Jun-2014 christos

Handle race where the server closed the socket between us 'connecting' and
sending data.


# 1.152 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.151 18-May-2014 rmind

Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.150 23-Jan-2014 hannken

branches: 1.150.2;
Change vnode operations create, mknod, mkdir and symlink to return
the resulting vnode *vpp unlocked.

Discussed on tech-kern@

Welcome to 6.99.30


# 1.149 17-Jan-2014 hannken

Change vnode operations create, mknod, mkdir and symlink to keep the
directory node dvp locked on return.

Discussed on tech-kern@

Welcome to 6.99.29


# 1.148 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.147 25-Oct-2013 martin

Eliminat an unused variable and simplify the KASSERT which used to use it


# 1.146 08-Oct-2013 christos

Centralize the sockaddr_un allocation code. Set sun_len appropriately so
that the address length returned is correct, not always 106. Note that
we do things slightly differently than linux and explain why. Unit-tests
to come.


# 1.145 08-Oct-2013 christos

- Instead of having accept(2) return a zero-filled sockaddr for the case
where accept(2) was called on a unix socket that called connect(2) and
then close(2), before the connection was accepted, return the empty
sockaddr_un.
- Fix the length of the empty sockaddr_un socket so that it reflects reality.


# 1.144 29-Aug-2013 rmind

Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.


# 1.143 01-Aug-2013 drochner

In unp_externalize, don't do anything if an SCM_RIGHTS control message
was sent with zero file descriptors in it. Otherwise, a zero-length
temporary storage would be allocated which triggers panic on DIAGNOSTIC
kernels (but is harmless for release kernels).
reviewed by Taylor R Campbell


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.142 27-Jun-2013 christos

branches: 1.142.2;
use sbcreatecontrol1() and m_add() instead of open-coding everything, and
getting it slightly wrong.


Revision tags: agc-symver-base
# 1.141 14-Feb-2013 riastradh

Fix some screw cases in cmsg file descriptor passing.

- Don't leave garbage in the control buffer if allocating file
descriptors fails in unp_externalize.

- Scrub the space between CMSG_LEN and CMSG_SPACE to avoid kernel
memory disclosure in unp_externalize.

- Don't read past cmsg_len when closing file descriptors that
couldn't get delivered, in free_rights.

ok christos


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.140 06-Oct-2012 christos

Avoid crash dereferencing a NULL fp in fd_affix() in unp_externalize
caused by the sequence of passing two fd's with two sendmsg()'s,
then doing a read() and a recvmsg(). The read() calls dom_dispose()
which discards both messages in the mbuf, and sets the fp's in the
array to NULL. Linux dequeues only one message per read() so the
second recvmsg() gets the fd from the second message. This fix
just avoids the NULL pointer de-reference, making the second
recvmsg() to fail. It is dubious to pass fd's with stream sockets
and expect mixing read() and recvmsg() to work. Plus processing
one control message per read() changes the current semantics and
should be examined before applied. In addition there is a race between
dom_externalize() and dom_dispose(): what happens in a multi-threaded
network stack when one thread disposes where the other externalizes
the same array?

NB: Pullup to 6.


# 1.139 30-Jul-2012 christos

branches: 1.139.2;
remove infinite loop on error, extra parens on return.


# 1.138 30-Jul-2012 christos

simplify unp_externalize(), some from gimpy, some from me.


# 1.137 02-Jun-2012 martin

Stopgap fix for PR kern/46463: disallow passing of kqueue descriptors
via SCM_RIGHT anxiliary socket messages.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.136 26-Jun-2011 christos

branches: 1.136.2; 1.136.8;
* Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).

- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.

* Fix the system calls that take socklen_t arguments to actually do so.

* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).

* Bump libc version for the new syscalls.


# 1.135 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.134 29-May-2011 manu

branches: 1.134.2;
Add SOCK_SEQPACKET to PL_LOCAL sockets. Based on patch from Jesse Off,
submitted 8 years ago:
http://mail-index.netbsd.org/tech-kern/2003/04/14/0006.html


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.133 19-Nov-2010 dholland

branches: 1.133.2;
Introduce struct pathbuf. This is an abstraction to hold a pathname
and the metadata required to interpret it. Callers of namei must now
create a pathbuf and pass it to NDINIT (instead of a string and a
uio_seg), then destroy the pathbuf after the namei session is
complete.

Update all namei call sites accordingly. Add a pathbuf(9) man page and
update namei(9).

The pathbuf interface also now appears in a couple of related
additional places that were passing string/uio_seg pairs that were
later fed into NDINIT. Update other call sites accordingly.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3
# 1.132 21-Oct-2010 yamt

unp_connect: fix an assertion


# 1.131 21-Oct-2010 yamt

unp_connect2: fix a comment.


Revision tags: yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.130 24-Jun-2010 hannken

Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.129 09-Feb-2010 wiz

branches: 1.129.2;
Fix typo in comment.


Revision tags: uebayasi-xip-base
# 1.128 08-Jan-2010 pooka

branches: 1.128.2;
The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change


Revision tags: matt-premerge-20091211 yamt-nfs-mp-base8 jym-xensuspend-nbase
# 1.127 26-Aug-2009 bouyer

In uipc_usrreq(PRU_ACCEPT), grab the unp_streamlock before unp_setpeerlocks().
This fixes a race where, for a short period of time, so->so_lock and
so2->so_lock are not sync. This makes solocked2() and solocked()
unreliable and cause DIAGNOSTIC kernel panics. This also fixes a possible
panic in unp_setaddr() which expects the socket locked.
Should fix kern/38968, fix proposed in
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005863.html


Revision tags: yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.126 24-May-2009 ad

More changes to improve kern_descrip.c.

- Avoid atomics in more places.
- Remove the per-descriptor mutex, and just use filedesc_t::fd_lock.
It was only being used to synchronize close, and in any case we needed
to take fd_lock to free the descriptor slot.
- Optimize certain paths for the <NDFDFILE case.
- Sprinkle more comments and assertions.
- Cache more stuff in filedesc_t.
- Fix numerous minor bugs spotted along the way.
- Restructure how the open files array is maintained, for clarity and so
that we can eliminate the membar_consumer() call in fd_getfile(). This is
mostly syntactic sugar; the main functional change is that fd_nfiles now
lives alongside the open file array.

Some measurements with libmicro:

- simple file syscalls are like close() are between 1 to 10% faster.
- some nice improvements, e.g. poll(1000) which is ~50% faster.


Revision tags: yamt-nfs-mp-base4 yamt-nfs-mp-base3 jym-xensuspend-base
# 1.125 04-May-2009 yamt

tweak some assertions on so_head to make them more meaningful.


Revision tags: nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.124 09-Apr-2009 yamt

0 -> NULL


# 1.123 09-Apr-2009 yamt

remove an unnecessary cast.


# 1.122 09-Apr-2009 yamt

0 -> NULL where appropriate


# 1.121 11-Mar-2009 mrg

completely rework the way that orphaned sockets that are being fdpassed
via SCM_RIGHTS messages are dealt with:

1. unp_gc: make this a kthread.

2. unp_detach: go not call unp_gc directly. instead, wake up unp_gc kthread.

3. unp_scan: do not close files here. instead, put them on a global list
for unp_gc to close, along with a per-file "deferred close count". if
file is already enqueued for close, just increment deferred close count.
this eliminates the recursive calls.

3. unp_gc: scan files on global deferred close list. close each file N
times, as specified by deferred close count in file. continue processing
list until it becomes empty (closing may cause additional files to be
queued for close).

4. unp_gc: add additional bit to mark files we are scanning. set during
initial scan of global file list that currently clears FMARK/FDEFER.
during later scans, never examine / garbage collect descriptors that
we have not marked during the earlier scan. do not proceed with this
initial scan until all deferred closes have been processed. be careful
with locking to ensure no races are introduced between deferred close
and file scan.

5. unp_gc: use dummy file_t to mark position in list when scanning. allow
us to drop filelist_lock. in turn allows us to eliminate kmem_alloc()
and safely close files, etc.

6. prohibit transfer of descriptors within SCM_RIGHTS messages if
(num_files_in_transit > maxfiles / unp_rights_ratio)

7. fd_allocfile: ensure recycled filse don't get scanned.


this is 97% work done by andrew doran, with a couple of minor bug fixes
and a lot of testing by yours truly.


Revision tags: nick-hppapmap-base2
# 1.120 08-Feb-2009 pooka

branches: 1.120.2;
Don't try to fd_putfile() descriptors we didn't manage to fd_getfile().

Fixes local DoS panic described in kern/40570.


Revision tags: netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.119 11-Oct-2008 pooka

branches: 1.119.2; 1.119.4;
Move uidinfo to its own module in kern_uidinfo.c and include in rump.
No functional change to uidinfo.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.118 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.117 20-Jun-2008 christos

branches: 1.117.2;
Also enforce that cm->cmsg_len >= CMSG_ALIGN(sizeof cmsghdr), from
Michael van Elst


# 1.116 20-Jun-2008 christos

Don't require cm->cmsg_len == control->m_len, just that the cm->cmsg_len
<= control->m_len, like FreeBSD does. Idea from Taylor R Campbell.


Revision tags: yamt-pf42-base4
# 1.115 10-Jun-2008 ad

There can be existing waiters on a socket's condition variables when we
change socket::so_lock, and they rely on the old lock to synchronize.
Wake them up whenever we change so_lock so they can restart their waits.


Revision tags: yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.114 28-Apr-2008 martin

branches: 1.114.2; 1.114.4;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.113 27-Apr-2008 ad

branches: 1.113.2;
Add a comment.


# 1.112 24-Apr-2008 ad

Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.111 20-Apr-2008 mlelstv

When unp_internalize fails (due to the sanity check or an out-of-memory
condition), it leaves the control message with file descriptors. Calling
unp_dispose() will interpret the message as containing file pointers
and crash the system.
This change removes unp_dispose() from this failure path and avoids
using goto to jump into switch statements...
The previous workaround to ignore such messages in unp_scan() is removed.


# 1.110 19-Apr-2008 mjf

If cm->cmsg_len is not valid for unp_internalize do not use it to work out
where the data is in unp_scan.

Fixes PR/38391


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.109 28-Mar-2008 ad

branches: 1.109.2;
Prevent overlapping calls to bind() and/or connect() on a Unix socket.


Revision tags: ad-socklock-base1
# 1.108 24-Mar-2008 yamt

merge yamt-lazymbuf branch.


Revision tags: yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.107 21-Mar-2008 rmind

unp_gc: unlock filelist_lock in a case of restart.


# 1.106 21-Mar-2008 ad

Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.105 25-Jan-2008 ad

branches: 1.105.6;
Remove VOP_LEASE. Discussed on tech-kern.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.104 05-Jan-2008 dsl

Use FILE_LOCK() and FILE_UNLOCK()


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base jmcneill-pm-base
# 1.103 08-Dec-2007 pooka

branches: 1.103.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.


Revision tags: vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase reinoud-bufcleanup-base
# 1.102 26-Nov-2007 pooka

branches: 1.102.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern


Revision tags: jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 vmlocking-base
# 1.101 08-Oct-2007 ad

branches: 1.101.4;
Merge file descriptor locking, cwdi locking and cross-call changes
from the vmlocking branch.


Revision tags: yamt-x86pmap-base2 yamt-x86pmap-base
# 1.100 19-Sep-2007 dyoung

branches: 1.100.2;
Make uipc_ctloutput() return ENOPROTOOPT instead of EINVAL when it
is passed a handle socket-option level that it does not care about.


Revision tags: nick-csl-alignment-base5
# 1.99 09-Aug-2007 he

branches: 1.99.2;
Add a new socket option for unix domain sockets: LOCAL_PEEREID, to make
it possible to get the pid, euid and egid of the process at the remote
end at the time it did bind() or connect().

Add a new libc function, getpeereid() to easily get at the euid and egid.
As a consequence, bump libc's minor number.

Document the LOCAL_PEEREID socket option in unix(4).

Based on contribution by Arne H. Juul, minor modifications by myself.


Revision tags: matt-mips64-base
# 1.98 03-Aug-2007 martin

branches: 1.98.2;
PR kern/32842:
do not leak file descriptors when sending a datagram with SCM_RIGHTS
fails. Patch from Gary Thorpe, based on changes in FreeBSD and work
from Christian Biere.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.97 22-Apr-2007 dsl

branches: 1.97.2; 1.97.6;
Change the way that emulations locate files within the emulation root to
avoid having to allocate space in the 'stackgap'
- which is very LWP unfriendly.
The additional code for non-emulation namei() is trivial, the reduction for
the emulations is massive.
The vnode for a processes emulation root is saved in the cwdi structure
during process exec.
If the emulation root the TRYEMULROOT flag are set, namei() will do an initial
search for absolute pathnames in the emulation root, if that fails it will
retry from the normal root.
".." at the emulation root will always go to the real root, even in the middle
of paths and when expanding symlinks.
Absolute symlinks found using absolute paths in the emulation root will be
relative to the emulation root (so /usr/lib/xxx.so -> /lib/xxx.so links
inside the emulation root don't need changing).
If the root of the emulation would be returned (for an emulation lookup), then
the real root is returned instead (matching the behaviour of emul_lookup,
but being a cheap comparison here) so that programs that scan "../.."
looking for the root dircetory don't loop forever.
The target for symbolic links is no longer mangled (it used to get the
CHECK_ALT_xxx() treatment, so could get /emul/xxx prepended).
CHECK_ALT_xxx() are no more. Most of the change is deleting them, and adding
TRYEMULROOT to the flags to NDINIT().
A lot of the emulation system call stubs could now be deleted.


Revision tags: thorpej-atomic-base
# 1.96 03-Apr-2007 hannken

Remove calls to now obsolete vn_start_write() and vn_finished_write().


# 1.95 04-Mar-2007 christos

branches: 1.95.2; 1.95.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.94 01-Nov-2006 cbiere

branches: 1.94.2; 1.94.4; 1.94.8;
Pointing one element past an array is fine, pointing before it isn't.


Revision tags: yamt-splraiseipl-base2 yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.93 03-Sep-2006 christos

branches: 1.93.2; 1.93.4;
use c99 initializers


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base8 yamt-pdpolicy-base7
# 1.92 23-Jul-2006 ad

Use the LWP cached credentials where sane.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.91 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.90 14-Apr-2006 christos

Coverity CID 1089: Add more KASSERTs to prevent NULL deref.


# 1.89 14-Apr-2006 christos

Coverity CID 1088: Add KASSERT to prevent NULL pointer deref.


# 1.88 13-Apr-2006 matt

Add a KASSERT to document a condition for the PRU_ABORT case.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.87 01-Mar-2006 christos

branches: 1.87.2; 1.87.4; 1.87.6;
PR/32856: Christian Biere: Don't panic if you send a control message with
SCM_RIGHTS on an unconnected stream socket.


# 1.86 11-Dec-2005 christos

branches: 1.86.2; 1.86.4; 1.86.6;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base ktrace-lwp-base
# 1.85 11-Nov-2005 simonb

Call nanotime() directly, instead of doing the
microtime()/TIMEVAL_TO_TIMESPEC() dance.


Revision tags: yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.84 30-Aug-2005 jmmv

Honor the user's umask while creating local sockets. Several other systems
do already this (such as FreeBSD, OpenBSD and Linux), so it will improve
portability of some third-party programs. No objections in tech-kern@.


# 1.83 16-Jun-2005 yamt

branches: 1.83.2;
uipc_usrreq: plug mbuf leak.


# 1.82 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


# 1.81 07-May-2005 christos

PR/30154: YAMAMOTO Takashi: tcp_close locking botch
chgsbsize() as mentioned in the PR can be called from an interrupt context
via tcp_close(). Avoid calling uid_find() in chgsbsize().
- Instead of storing so_uid in struct socketvar, store *so_uidinfo
- Add a simple lock to struct uidinfo.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.80 26-Feb-2005 perry

branches: 1.80.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.79 03-Sep-2004 darrenr

branches: 1.79.4; 1.79.6;
add a per-socket counter for dropped UDP packets when the internal buffers
are full.


# 1.78 22-May-2004 jonathan

Eliminate several uses of `curproc' from the socket-layer code and from NFS.

Add a new explicit `struct proc *p' argument to socreate(), sosend().
Use that argument instead of curproc. Follow-on changes to pass that
argument to socreate(), sosend(), and (*so->so_send)() calls.
These changes reviewed and independently recoded by Matt Thomas.

Changes to soreceive() and (*dom->dom_exernalize() from Matt Thomas:
pass soreceive()'s struct uio* uio->uio_procp to unp_externalize().
Eliminate curproc from unp_externalize. Also, now soreceive() uses
its uio->uio_procp value, pass that same value downward to
((pr->pru_usrreq)() calls for consistency, instead of (struct proc * )0.

Similar changes in sys/nfs to eliminate (most) uses of curproc,
either via the req-> r_procp field of a struct nfsreq *req argument,
or by passing down new explicit struct proc * arguments.

Reviewed by: Matt Thomas, posted to tech-kern.
NB: The (*pr->pru_usrreq)() change should be tested on more (all!) protocols.


# 1.77 18-Apr-2004 matt

Constify sun_noname.


# 1.76 18-Apr-2004 matt

ANSI'fy.


# 1.75 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.74 23-Mar-2004 junyoung

branches: 1.74.2; 1.74.4;
Nuke __P().


# 1.73 29-Dec-2003 martin

Avoid using m_clget() on a mbuf already in use, especially when we
need the data in the mbuf later and m_clget() changes some fields
overlaid to regular mbuf data. Instead, rearange code a bit, create
data into a new allocated buffer and and use MEXTADD to attach it to
the mbuf, if the mbuf internal space is not sufficient.

This fixes a crash on sparc64 (and probably all other archs where
sizeof(int) != sizeof(struct file *)) when running
regress/sys/kern/unfdpass.

Idea for solution from Matt Thomas, with additional input from YAMAMOTO
Takashi.


# 1.72 29-Nov-2003 matt

Restore a change that made AF_LOCAL sockets block on connect(2) until
accepted. However, this time this behavor is not the default. Instead
it must enabled by using the LOCAL_CONNWAIT socket option on either the
connecting or accepting socket.


# 1.71 29-Nov-2003 perry

Revert a change that altered the semantics of AF_LOCAL sockets. Sadly
this made us API incompatible with other Unixes.


# 1.70 15-Oct-2003 hannken

Add the gating of system calls that cause modifications to the underlying
file system.
The function vfs_write_suspend stops all new write operations to a file
system, allows any file system modifying system calls already in progress
to complete, then sync's the file system to disk and returns. The
function vfs_write_resume allows the suspended write operations to
complete.

From FreeBSD with slight modifications.

Approved by: Frank van der Linden <fvdl@netbsd.org>


# 1.69 03-Sep-2003 matt

Fix typo.


# 1.68 03-Sep-2003 matt

Change the behavor of AF_LOCAL connect() to sleep until the server has
accepted the connection. This can prevent a client from overwhelming a
server.


# 1.67 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.66 24-Jul-2003 jdolecek

back rev 1.63 (the linux hack) off - no compat specific code
in generic code, please

we need to massage the passed linux cmsg anyway, linux uses different
alignment for CMSG_DATA on at least some architectures


# 1.65 23-Jul-2003 itojun

backout previous, there was a comment on LINUX_SOL_SOCKET=1


# 1.64 23-Jul-2003 itojun

#define LINUX_SOL_SOCKET 1, so that we can answer "what the hell is this 1?"
at ease.


# 1.63 23-Jul-2003 christos

From Todd Vierling: Accept level == 1 for linux compat.


# 1.62 29-Jun-2003 fvdl

branches: 1.62.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.61 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.60 10-Apr-2003 christos

RP/21088: Jesse Off: Return ENOBUFS instead of EINVAL when sbappend fails.


# 1.59 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


# 1.58 25-Feb-2003 pk

Fix a simple_lock() mismatch in unp_internalize().
We may need to merge the passes over the files contained in the message
as noted by enami tsugutomo on tech-smp.


# 1.57 23-Feb-2003 pk

Make updating a file's reference and use count MP-safe.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.56 25-Nov-2002 itojun

branches: 1.56.2;
no need for error check after MEXTMALLOC - jdolecek


# 1.55 25-Nov-2002 itojun

MEXTMALLOC() can fail even if M_WAITOK, if arg is too big for malloc().


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.54 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base gehenna-devsw-base eeh-devprop-base newlock-base ifpoll-base thorpej-mips-cache-base
# 1.53 12-Nov-2001 lukem

add RCSIDs


# 1.52 18-Oct-2001 thorpej

branches: 1.52.2;
Deprecate the "m_act" alias of "m_nextpkt" (m_act is a historical
name), and just use m_nextpkt everywhere.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.51 14-Jun-2001 thorpej

branches: 1.51.2;
Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.


# 1.50 07-Jun-2001 thorpej

Rework fdalloc() even further: split fdalloc() into fdalloc() and
fdexpand(). The former will return ENOSPC if there is not space
in the current filedesc table. The latter performs the expansion
of the filedesc table. This means that fdalloc() won't ever block,
and it gives callers an opportunity to clean up before the
potentially-blocking fdexpand() call.

Update all fdalloc() callers to deal with the need-to-fdexpand() case.

Rewrite unp_externalize() to use fdalloc() and fdexpand() in a
safe way, using an algorithm suggested by Bill Sommerfeld:
- Use a temporary array of integers to hold the new filedesc table
indexes. This allows us to repeat the loop if necessary.
- Loop through the array of file *'s, assigning them to filedesc table
slots. If fdalloc() indicates expansion is necessary, undo the
assignments we've done so far, expand, and retry the whole process.
- Once all file *'s have been assigned to slots, update the f_msgcount
and unp_rights counters.
- Right before we return, copy the temporary integer array to the message
buffer, and trim the length as before.
Note that once locking is added to the filedesc array, this entire
operation will be `atomic', in that the lock will be held while
file *'s are assigned to embryonic table slots, thus preventing anything
else from using them.


# 1.49 06-Jun-2001 thorpej

Change fdalloc() to return ERESTART if we had to reallocate the
descriptor array, which may have blocked. Change callers of
fdalloc() to restart whatever they\'re doing if this condition
happens. (XXX unp_externalize() needs some work, but that will
be tackled later.)

Change finishdup() to close the descriptor in the `new\' slot if
one exists, and change sys_dup2() accordingly.

Closes a race condition when using kernel-assisted user threads.

While here, garbage-collect UF_MAPPED -- it is not used anywhere.


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base thorpej_scsipi_base
# 1.48 05-Jun-2000 thorpej

branches: 1.48.2; 1.48.4;
Oops, missed a couple of places where CMSG_*() should be used. No
functional change in this case, but the code is now correct.


# 1.47 05-Jun-2000 thorpej

- Fix file descriptor passing AGAIN. This has apparently been broken
on LP64 systems (and probably the SPARC) since the __cmsg_alignbytes()
changes went in.
- Change file descriptor passing to use CMSG_DATA(), not (cm + 1). This
pretty much has to be done in order to make it work properly on LP64,
and considering that it's been broken this long...
- Use CMSG_SPACE() to determine the mbuf length needed for a given
control message, and CMSG_LEN() to stash in the cmsg_len member.


Revision tags: minoura-xpg4dl-base
# 1.46 30-Mar-2000 augustss

branches: 1.46.2;
Get rid of register declarations.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base chs-ubc2-base
# 1.45 17-Jun-1999 thorpej

branches: 1.45.2;
Um, hi, let's initialize pointers before we use them.


# 1.44 05-May-1999 thorpej

Add "use counting" to file entries. When closing a file, and it's reference
count is 0, wait for use count to drain before finishing the close.

This is necessary in order for multiple processes to safely share file
descriptor tables.


# 1.43 05-May-1999 thorpej

Fix alignment problem in the garbage-collection code path.


# 1.42 30-Apr-1999 thorpej

Break cdir/rdir/cmask info out of struct filedesc, and put it in a new
substructure, `cwdinfo'. Implement optional sharing of this substructure.

This is required for clone(2).


# 1.41 21-Apr-1999 mrg

revert previous. oops.


# 1.40 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: netbsd-1-4-base
# 1.39 22-Mar-1999 sommerfe

branches: 1.39.2;
Disallow descriptor-passing of descriptors which are open on
directories which aren't under the recipient's root.

Clean up of many error conditions involving descriptor passing, to
eliminate infinite loops, panics, premature garbage collection of
sockets, and descriptor leaks:
- Avoid letting unp_gc() see descriptors with a refcount of zero by
removing them from the socket's queue before releasing them.
- Avoid socket leak in PRU_ABORT (this will also gc descriptors queued
on a not-yet accepted socket when the accepting socket goes away).
- Put in block comment explaining how unp_gc() should work.
- Correctly manage unp_defer count so we don't get stuck in an infinite
loop with nothing to do.
- Don't tie MARK and DEFER bits so closely together.
- Mark descriptors queued on not-yet-accepted sockets as well.
- Don't call sorflush on non-socket, it doesn't work very well.
- Deal with discard of NULL file pointer.
- Hopefully cause GC to converge faster by only deferring sockets in
unp_mark().


# 1.38 21-Dec-1998 thorpej

In unp_internalize(), add a comment explaining why we must ALIGN() the
data after the cmsghdr when accessing internalized SCM_RIGHTS messages
(i.e. array of struct file *s). The historic interface does not align
the externalized SCM_RIGHTS messages (i.e. array of ints).


# 1.37 21-Dec-1998 thorpej

Fix a fencepost error in unp_scan() which caused a bad pointer deref on
the SPARC platform only (ILP32 but ALIGNBYTES of 7), due to a missing
ALIGN().


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.36 04-Aug-1998 perry

Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)


# 1.35 31-Jul-1998 perry

fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.


Revision tags: eeh-paddr_t-base
# 1.34 18-Jul-1998 lukem

branches: 1.34.2;
use AF_LOCAL instead of AF_UNIX


# 1.33 16-Jul-1998 thorpej

Back out previous, I botched something.


# 1.32 10-Jul-1998 thorpej

For SOCK_STREAM, provide the socket credentials to the accepter as soon as
the client connects.


# 1.31 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.30 07-Jan-1998 thorpej

Implement passing credentials as ancillary data on Unix domain sockets,
enabled with the LOCAL_CREDS socket option on the listener. Semantics are
similar to BSD/OS's:
- Creds are available with first data on SOCK_STREAM, and with every datagram
on SOCK_DGRAM.
- It is not possible to forge credentials.

Different in that:
- Different credential data structure (ours does not rely on the format
of internal kernel data structures, and does not pass the login name).
- We can pass creds and file descriptors at the same time (this does not
work in BSD/OS).

Luke Mewburn <lukem@netbsd.org> gets credit for inspiring me to implement
this. :-)


# 1.29 07-Jan-1998 thorpej

Fix passing of multiple file descriptors (was broken when code was made
64-bit safe).


Revision tags: netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.28 17-Oct-1997 christos

branches: 1.28.2;
PR/4280: Chris Jones: Sending more than one fd over AF_UNIX sockets causes
panic. Bug in the fd -> struct file * conversion...


Revision tags: thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.27 26-Jun-1997 thorpej

Several small changes to eliminate kludginess in dealing with unix domain
socket names:
- In unp_setsockaddr() and unp_setpeeraddr(), if the socket name can't
fit into a single mbuf, allocate enough external storage space to
hold it.
- In unp_bind() and unp_connect(), perform a similar operation, but allocate
one extra byte, and ensure that the pathname is nul-terminated.
Many thanks to enami tsugutomo <enami@cv.sony.co.jp> for the sanity
checking.


# 1.26 24-Jun-1997 thorpej

Eliminate use of dtom() in the handing of UNIX domain sockets. Add an
"unp_addrlen" member to the unpcb, and use it when copying the socket
name. This eliminates that last uses of dtom() in the system.


# 1.25 15-May-1997 kleink

When fstat(2)ing a file descriptor of a local communications domain socket,
fill the socket's creation time into the stat structure's st_[acm]time fields:
POSIX requires this behavior for pipe(2). N.B.: updating the st_[am]time fields
when reading/writing the pipe is neither required nor implemented, though.


# 1.24 10-Apr-1997 cgd

Internalize and externalize file descriptors being passed via local domain
socket control messages correctly, without assuming that sizeof(int) ==
sizeof(pointer). Fixes PR#3183.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.23 23-May-1996 mycroft

Oops. Add missing label.


# 1.22 23-May-1996 mycroft

We can only get a control mbuf for PRU_SEND or PRU_SENDOOB. Add diagnostic
code to panic in this case.


# 1.21 23-May-1996 mycroft

Make sure the control and data mbufs are freed in all cases.


# 1.20 23-May-1996 mycroft

Separate some code into separate functions.
Make unp_addr be a pointer to the sockaddr, not to the mbuf, as with raw
sockets.
Other minor cleanup.


# 1.19 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.18 09-Feb-1996 christos

branches: 1.18.4;
More proto fixes


# 1.17 04-Feb-1996 pk

unp_detach() return type botch.


# 1.16 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.15 17-Aug-1995 mycroft

so_pcb should be a void *.


# 1.14 16-Aug-1995 mycroft

Allocate PCBs with malloc(), more mgetclr(). Be more careful to free the
PCB after it's done with.


# 1.13 05-Apr-1995 mycroft

Add missing argument to closef().


# 1.12 13-Dec-1994 mycroft

LEASE_CHECK -> VOP_LEASE


# 1.11 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.10 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.9 08-Jun-1994 mycroft

Update to 4.4-Lite fs code.


# 1.8 04-May-1994 mycroft

Fix panic when closing a file descriptor on which access rights have been sent
but not received.


# 1.7 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.6 14-Sep-1993 mycroft

Fix from Noriyuki Soda <soda@sra.co.jp>:
recvmsg(2) always returns -1 with errno==EMSGSIZE, when trying
to pass file descriptors through UNIX domain socket.


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.5 27-Jun-1993 andrew

branches: 1.5.4;
ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.4 12-Jun-1993 andrew

Yuval Yarom's 'panic:closef: count < 0' fix to unp_discard().


# 1.3 22-May-1993 cgd

add include of select.h if necessary for protos, or delete if extraneous


# 1.2 18-May-1993 cgd

make kernel select interface be one-stop shopping & clean it all up.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.196 01-Feb-2020 riastradh

Load struct fdfile::ff_file with atomic_load_consume.

Exceptions: when we're only testing whether it's there, not about to
dereference it.

Note: We do not use atomic_store_release to set it because the
preceding mutex_exit should be enough.

(That said, it's not clear the mutex_enter/exit is needed unless
refcnt > 0 already, in which case maybe it would be a win to switch
from the membar implied by mutex_enter to the membar implied by
atomic_store_release -- which I would generally expect to be much
cheaper. And a little clearer without a long comment.)


# 1.195 01-Feb-2020 riastradh

Load struct filedesc::fd_dt with atomic_load_consume.

Exceptions: when fd_refcnt <= 1, or when holding fd_lock.

While here:

- Restore KASSERT(mutex_owned(&fdp->fd_lock)) in fd_unused.
=> This is used only in fd_close and fd_abort, where it holds.
- Move bounds check assertion in fd_putfile to where it matters.
- Store fd_dt with atomic_store_release.
- Move load of fd_dt under lock in knote_fdclose.
- Omit membar_consumer in fdesc_readdir.
=> atomic_load_consume serves the same purpose now.
=> Was needed only on alpha anyway.


Revision tags: netbsd-9-0-RC2 ad-namecache-base2 ad-namecache-base1 ad-namecache-base netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base
# 1.194 29-Jul-2019 maxv

branches: 1.194.4;
Fix info leak: the padding after the header causes uninitialized heap
memory to be copied to userland in sys_recvmsg().


Revision tags: phil-wifi-20190609
# 1.193 03-Jun-2019 msaitoh

Fix typo in comment (s/seperate/separate/).


Revision tags: isaki-audio2-base
# 1.192 01-Mar-2019 pgoyette

Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.

NFCI intended.

Ride the earlier kernel bump - it;s getting crowded.


# 1.191 20-Feb-2019 pgoyette

compat70_ocreds_valid is not a pointer to a boolean, it is the boolean
itself which controls whether or not we recognize the OCRED options.

Should fix the panic identified in PR kern/53991 (awaiting confirmation
from submitter).


# 1.190 04-Feb-2019 mrg

add or adjust fallthru comments.


# 1.189 29-Jan-2019 pgoyette

Normalize all the compat hooks' names to the form

<subsystem>_<function>_<version>_hook

NFCI

XXX Note that although this introduces a change in the kernel-to-
XXX module interface, we are NOT bumping the kernel version number.
XXX We will bump the version number once the interface stabilizes.


# 1.188 27-Jan-2019 pgoyette

Merge the [pgoyette-compat] branch


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.187 08-Nov-2018 roy

Don't call soroverflow when we return the error to the sender.

Thanks to thorpej@ for a sanity check.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.186 11-May-2018 roy

branches: 1.186.2;
Increase the default size of some receive buffers from 8k to 16k.
This mitigates recent reports of socket overflow errors
and fixes PR bin/53247.


# 1.185 05-May-2018 christos

bump PIPSIZ from 4 to 8K like FreeBSD and provide the same sysctls


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.184 19-Mar-2018 roy

socket: report receive buffer overflows

Add soroverflow() which increments the overflow counter, sets so_error
to ENOBUFS and wakes the receive socket up.
Replace all code that manually increments this counter with soroverflow().
Add soroverflow() to raw_input().

This allows userland to detect route(4) overflows so it can re-sync
with the current state.


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.183 17-Feb-2018 christos

branches: 1.183.2;
fix LOCAL_PEEREID to not return the same info for both sides...
XXX: pullup-{7,8}


Revision tags: tls-maxphys-base-20171202
# 1.182 02-Dec-2017 mrg

include opt_compat_netbsd.h, so that eg COMPAT_70 will be set.


Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.181 31-Oct-2016 maxv

branches: 1.181.8;
Memory leak, found by Mootja. It is easily triggerable from userland.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.180 06-Apr-2016 roy

branches: 1.180.2;
Add sc_pid to sockcred so that SOCK_DGRAM and LOCAL_CREDS socket option
can learn the process id of the AF_LOCAL sender.
Add compat glue for old structure.


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606
# 1.179 02-May-2015 rtr

make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}


# 1.178 26-Apr-2015 rtr

remove pr_generic from struct pr_usrreqs and all implementations of
pr_generic in protocols.

bump to 7.99.13

approved by rmind@


# 1.177 24-Apr-2015 rtr

make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19


Revision tags: nick-nhusb-base-20150406
# 1.176 03-Apr-2015 rtr

* change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@


# 1.175 01-Mar-2015 christos

PR/39918: ITOH Yasufumi: Replace KASSERT with continue, since the file
descriptor can be closed since closef() does not pay attention to FDEFER.
XXX: Pullup-7


# 1.174 28-Feb-2015 rtr

Place opening brace to function at column 0, like in the rest of the file.


# 1.173 02-Feb-2015 christos

Handle LOCAL_PEERID for socketpair() connected sockets which connect through
connect2().
1. move the code that sets the peerid structure into connect1(). This
handles so2. The datagram code calls connect2 twice with flipped
so arguments so both sockets get set.
2. in connect2 copy the peerid structure from so2 to so, so that that
both stream sockets get set.


Revision tags: nick-nhusb-base
# 1.172 08-Oct-2014 taca

branches: 1.172.2;
Make behavior of getsockname(2) (and maybe getpeername(2)) as the same as
NetBSD 6.1_STABLE and other operating system (OS X 10.9.5).

* sa_len of sockaddr_un strucrure is always set to sizeof(sun_path).
* pathname stored in sun_path is alwasys '\0' terminated (except length
of sun_path is sizeof(sun_path)?).

Should be fix PR kern/49247, runtime problem of lmtp service of dovecot2 on
NetBSD current and NetBSD 7.0_BETA.


# 1.171 05-Sep-2014 matt

Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.


# 1.170 05-Sep-2014 matt

Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 09-Aug-2014 rtr

branches: 1.169.2;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@


# 1.168 08-Aug-2014 rtr

split PRU_RCVD function out of pr_generic() usrreq switches and put into
separate functions

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_RCVD with calls to
pr_rcvd()


# 1.167 05-Aug-2014 rtr

actually use the passed in struct lwp *l instead of curlwp in unp_connect()


# 1.166 05-Aug-2014 rtr

split PRU_SEND function out of pr_generic() usrreq switches and put into
separate functions

xxx_send(struct socket *, struct mbuf *, struct mbuf *,
struct mbuf *, struct lwp *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_SEND with calls to
pr_send()

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_send() PRUs

- l2cap_send() -> l2cap_send_pcb()
- sco_send() -> sco_send_pcb()
- rfcomm_send() -> rfcomm_send_pcb()

patch reviewed by rmind


# 1.165 05-Aug-2014 rtr

revert the removal of struct lwp * parameter from bind, listen and connect
user requests.

this should resolve the issue relating to nfs client hangs presented
recently by wiz on current-users@


# 1.164 31-Jul-2014 rtr

* remove declarations of unp_bind, unp_discard, unp_disconnect1, unp_drop,
unp_shutdown1, unp_internalize and unp_output functions from sys/un.h
and instead declare them as static in uipc_usrreq.c with prototype
declarations as necessary.

* remove struct lwp * parameter from unp_output() while here and just
use curlwp instead.

as discussed with rmind


# 1.163 31-Jul-2014 rtr

split PRU_DISCONNECT, PRU_SHUTDOWN and PRU_ABORT function out of
pr_generic() usrreq switches and put into separate functions

xxx_disconnect(struct socket *)
xxx_shutdown(struct socket *)
xxx_abort(struct socket *)

- always KASSERT(solocked(so)) even if not implemented
- replace calls to pr_generic() with req =
PRU_{DISCONNECT,SHUTDOWN,ABORT}
with calls to pr_{disconnect,shutdown,abort}() respectively

rename existing internal functions used to implement above functionality
to permit use of the names for xxx_{disconnect,shutdown,abort}().

- {l2cap,sco,rfcomm}_disconnect() ->
{l2cap,sco,rfcomm}_disconnect_pcb()
- {unp,rip,tcp}_disconnect() -> {unp,rip,tcp}_disconnect1()
- unp_shutdown() -> unp_shutdown1()

patch reviewed by rmind


# 1.162 30-Jul-2014 rtr

split PRU_CONNECT function out of pr_generic() usrreq switches and put
into seaparate functions

xxx_listen(struct socket *, struct mbuf *)

- always KASSERT(solocked(so)) and KASSERT(nam != NULL)
- replace calls to pr_generic() with req = PRU_CONNECT with
pr_connect()
- rename existin {l2cap,sco,rfcomm}_connect() to
{l2cap,sco,rfcomm}_connect_pcb() respectively to permit
naming consistency with other protocols functions.
- drop struct lwp * parameter from unp_connect() and at_pcbconnect()
and use curlwp instead where appropriate.

patch reviewed by rmind


# 1.161 24-Jul-2014 rtr

split PRU_BIND and PRU_LISTEN function out of pr_generic() usrreq
switches and put into separate functions
xxx_bind(struct socket *, struct mbuf *)
xxx_listen(struct socket *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_BIND with call to
pr_bind()

- replace calls to pr_generic() with req = PRU_LISTEN with call to
pr_listen()

- drop struct lwp * parameter from at_pcbsetaddr(), in_pcbbind() and
unp_bind() and always use curlwp.

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_{bind,listen}() PRUs

- l2cap_{bind,listen}() -> l2cap_{bind,listen}_pcb()
- sco_{bind,listen}() -> sco_{bind,listen}_pcb()
- rfcomm_{bind,listen}() -> rfcomm_{bind,listen}_pcb()

patch reviewed by rmind

welcome to netbsd 6.99.48


# 1.160 23-Jul-2014 rtr

split PRU_SENDOOB and PRU_RCVOOB function out of pr_generic() usrreq
switches and put into separate functions
xxx_sendoob(struct socket *, struct mbuf *, struct mbuf *)
xxx_recvoob(struct socket *, struct mbuf *, int)

- always KASSERT(solocked(so)) even if request is not implemented

- replace calls to pr_generic() with req = PRU_{SEND,RCV}OOB with
calls to pr_{send,recv}oob() respectively.

there is still some tweaking of m_freem(m) and m_freem(control) to come
for consistency. not performed with this commit for clarity.

reviewed by rmind


# 1.159 09-Jul-2014 rtr

* split PRU_ACCEPT function out of pr_generic() usrreq switches and put
into a separate function xxx_accept(struct socket *, struct mbuf *)

note: future cleanup will take place to remove struct mbuf parameter
type and replace it with a more appropriate type.

patch reviewed by rmind


# 1.158 09-Jul-2014 rtr

* split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47


# 1.157 07-Jul-2014 rtr

* sprinkle KASSERT(solocked(so)); in all pr_stat() functions.
* fix remaining inconsistent struct socket parameter names.


# 1.156 06-Jul-2014 rtr

* split PRU_SENSE functionality out of xxx_usrreq() switches and place into
separate xxx_stat(struct socket *, struct stat *) functions.
* replace calls using pr_generic with req == PRU_SENSE with pr_stat().

further change will follow that cleans up the pattern used to extract the
pcb and test for its presence.

reviewed by rmind


# 1.155 01-Jul-2014 rtr

fix parameter types in pr_ioctl, called xx_control() functions and remove
abuse of pointer to struct mbuf type.

param2 changed to u_long type and uses parameter name 'cmd' (ioctl command)
param3 changed to void * type and uses parameter name 'data'
param4 changed to struct ifnet * and uses parameter name 'ifp'
param5 has been removed (formerly struct lwp *) and uses of 'l' have been
replaced with curlwp from curproc(9).

callers have had (now unnecessary) casts to struct mbuf * removed, called
code has had (now unnecessary) casts to u_long, void * and struct ifnet *
respectively removed.

reviewed by rmind@


# 1.154 22-Jun-2014 rtr

* split PRU_CONTROL functionality out of xxx_userreq() switches and place
into separate xxx_ioctl() functions.
* place KASSERT(req != PRU_CONTROL) inside xxx_userreq() as it is now
inappropriate for req = PRU_CONTROL in xxx_userreq().
* replace calls to pr_generic() with req = PRU_CONTROL with pr_ioctl().
* remove & fixup references to PRU_CONTROL xxx_userreq() function comments.
* fix various comments references for xxx_userreq() that mentioned
PRU_CONTROL as xxx_userreq() no longer handles the request.

a further change will follow to fix parameter and naming inconsistencies
retained from original code.

Reviewed by rmind@


# 1.153 08-Jun-2014 christos

Handle race where the server closed the socket between us 'connecting' and
sending data.


# 1.152 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.151 18-May-2014 rmind

Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.150 23-Jan-2014 hannken

branches: 1.150.2;
Change vnode operations create, mknod, mkdir and symlink to return
the resulting vnode *vpp unlocked.

Discussed on tech-kern@

Welcome to 6.99.30


# 1.149 17-Jan-2014 hannken

Change vnode operations create, mknod, mkdir and symlink to keep the
directory node dvp locked on return.

Discussed on tech-kern@

Welcome to 6.99.29


# 1.148 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.147 25-Oct-2013 martin

Eliminat an unused variable and simplify the KASSERT which used to use it


# 1.146 08-Oct-2013 christos

Centralize the sockaddr_un allocation code. Set sun_len appropriately so
that the address length returned is correct, not always 106. Note that
we do things slightly differently than linux and explain why. Unit-tests
to come.


# 1.145 08-Oct-2013 christos

- Instead of having accept(2) return a zero-filled sockaddr for the case
where accept(2) was called on a unix socket that called connect(2) and
then close(2), before the connection was accepted, return the empty
sockaddr_un.
- Fix the length of the empty sockaddr_un socket so that it reflects reality.


# 1.144 29-Aug-2013 rmind

Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.


# 1.143 01-Aug-2013 drochner

In unp_externalize, don't do anything if an SCM_RIGHTS control message
was sent with zero file descriptors in it. Otherwise, a zero-length
temporary storage would be allocated which triggers panic on DIAGNOSTIC
kernels (but is harmless for release kernels).
reviewed by Taylor R Campbell


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.142 27-Jun-2013 christos

branches: 1.142.2;
use sbcreatecontrol1() and m_add() instead of open-coding everything, and
getting it slightly wrong.


Revision tags: agc-symver-base
# 1.141 14-Feb-2013 riastradh

Fix some screw cases in cmsg file descriptor passing.

- Don't leave garbage in the control buffer if allocating file
descriptors fails in unp_externalize.

- Scrub the space between CMSG_LEN and CMSG_SPACE to avoid kernel
memory disclosure in unp_externalize.

- Don't read past cmsg_len when closing file descriptors that
couldn't get delivered, in free_rights.

ok christos


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.140 06-Oct-2012 christos

Avoid crash dereferencing a NULL fp in fd_affix() in unp_externalize
caused by the sequence of passing two fd's with two sendmsg()'s,
then doing a read() and a recvmsg(). The read() calls dom_dispose()
which discards both messages in the mbuf, and sets the fp's in the
array to NULL. Linux dequeues only one message per read() so the
second recvmsg() gets the fd from the second message. This fix
just avoids the NULL pointer de-reference, making the second
recvmsg() to fail. It is dubious to pass fd's with stream sockets
and expect mixing read() and recvmsg() to work. Plus processing
one control message per read() changes the current semantics and
should be examined before applied. In addition there is a race between
dom_externalize() and dom_dispose(): what happens in a multi-threaded
network stack when one thread disposes where the other externalizes
the same array?

NB: Pullup to 6.


# 1.139 30-Jul-2012 christos

branches: 1.139.2;
remove infinite loop on error, extra parens on return.


# 1.138 30-Jul-2012 christos

simplify unp_externalize(), some from gimpy, some from me.


# 1.137 02-Jun-2012 martin

Stopgap fix for PR kern/46463: disallow passing of kqueue descriptors
via SCM_RIGHT anxiliary socket messages.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.136 26-Jun-2011 christos

branches: 1.136.2; 1.136.8;
* Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).

- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.

* Fix the system calls that take socklen_t arguments to actually do so.

* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).

* Bump libc version for the new syscalls.


# 1.135 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.134 29-May-2011 manu

branches: 1.134.2;
Add SOCK_SEQPACKET to PL_LOCAL sockets. Based on patch from Jesse Off,
submitted 8 years ago:
http://mail-index.netbsd.org/tech-kern/2003/04/14/0006.html


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.133 19-Nov-2010 dholland

branches: 1.133.2;
Introduce struct pathbuf. This is an abstraction to hold a pathname
and the metadata required to interpret it. Callers of namei must now
create a pathbuf and pass it to NDINIT (instead of a string and a
uio_seg), then destroy the pathbuf after the namei session is
complete.

Update all namei call sites accordingly. Add a pathbuf(9) man page and
update namei(9).

The pathbuf interface also now appears in a couple of related
additional places that were passing string/uio_seg pairs that were
later fed into NDINIT. Update other call sites accordingly.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3
# 1.132 21-Oct-2010 yamt

unp_connect: fix an assertion


# 1.131 21-Oct-2010 yamt

unp_connect2: fix a comment.


Revision tags: yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.130 24-Jun-2010 hannken

Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.129 09-Feb-2010 wiz

branches: 1.129.2;
Fix typo in comment.


Revision tags: uebayasi-xip-base
# 1.128 08-Jan-2010 pooka

branches: 1.128.2;
The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change


Revision tags: matt-premerge-20091211 yamt-nfs-mp-base8 jym-xensuspend-nbase
# 1.127 26-Aug-2009 bouyer

In uipc_usrreq(PRU_ACCEPT), grab the unp_streamlock before unp_setpeerlocks().
This fixes a race where, for a short period of time, so->so_lock and
so2->so_lock are not sync. This makes solocked2() and solocked()
unreliable and cause DIAGNOSTIC kernel panics. This also fixes a possible
panic in unp_setaddr() which expects the socket locked.
Should fix kern/38968, fix proposed in
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005863.html


Revision tags: yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.126 24-May-2009 ad

More changes to improve kern_descrip.c.

- Avoid atomics in more places.
- Remove the per-descriptor mutex, and just use filedesc_t::fd_lock.
It was only being used to synchronize close, and in any case we needed
to take fd_lock to free the descriptor slot.
- Optimize certain paths for the <NDFDFILE case.
- Sprinkle more comments and assertions.
- Cache more stuff in filedesc_t.
- Fix numerous minor bugs spotted along the way.
- Restructure how the open files array is maintained, for clarity and so
that we can eliminate the membar_consumer() call in fd_getfile(). This is
mostly syntactic sugar; the main functional change is that fd_nfiles now
lives alongside the open file array.

Some measurements with libmicro:

- simple file syscalls are like close() are between 1 to 10% faster.
- some nice improvements, e.g. poll(1000) which is ~50% faster.


Revision tags: yamt-nfs-mp-base4 yamt-nfs-mp-base3 jym-xensuspend-base
# 1.125 04-May-2009 yamt

tweak some assertions on so_head to make them more meaningful.


Revision tags: nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.124 09-Apr-2009 yamt

0 -> NULL


# 1.123 09-Apr-2009 yamt

remove an unnecessary cast.


# 1.122 09-Apr-2009 yamt

0 -> NULL where appropriate


# 1.121 11-Mar-2009 mrg

completely rework the way that orphaned sockets that are being fdpassed
via SCM_RIGHTS messages are dealt with:

1. unp_gc: make this a kthread.

2. unp_detach: go not call unp_gc directly. instead, wake up unp_gc kthread.

3. unp_scan: do not close files here. instead, put them on a global list
for unp_gc to close, along with a per-file "deferred close count". if
file is already enqueued for close, just increment deferred close count.
this eliminates the recursive calls.

3. unp_gc: scan files on global deferred close list. close each file N
times, as specified by deferred close count in file. continue processing
list until it becomes empty (closing may cause additional files to be
queued for close).

4. unp_gc: add additional bit to mark files we are scanning. set during
initial scan of global file list that currently clears FMARK/FDEFER.
during later scans, never examine / garbage collect descriptors that
we have not marked during the earlier scan. do not proceed with this
initial scan until all deferred closes have been processed. be careful
with locking to ensure no races are introduced between deferred close
and file scan.

5. unp_gc: use dummy file_t to mark position in list when scanning. allow
us to drop filelist_lock. in turn allows us to eliminate kmem_alloc()
and safely close files, etc.

6. prohibit transfer of descriptors within SCM_RIGHTS messages if
(num_files_in_transit > maxfiles / unp_rights_ratio)

7. fd_allocfile: ensure recycled filse don't get scanned.


this is 97% work done by andrew doran, with a couple of minor bug fixes
and a lot of testing by yours truly.


Revision tags: nick-hppapmap-base2
# 1.120 08-Feb-2009 pooka

branches: 1.120.2;
Don't try to fd_putfile() descriptors we didn't manage to fd_getfile().

Fixes local DoS panic described in kern/40570.


Revision tags: netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.119 11-Oct-2008 pooka

branches: 1.119.2; 1.119.4;
Move uidinfo to its own module in kern_uidinfo.c and include in rump.
No functional change to uidinfo.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.118 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.117 20-Jun-2008 christos

branches: 1.117.2;
Also enforce that cm->cmsg_len >= CMSG_ALIGN(sizeof cmsghdr), from
Michael van Elst


# 1.116 20-Jun-2008 christos

Don't require cm->cmsg_len == control->m_len, just that the cm->cmsg_len
<= control->m_len, like FreeBSD does. Idea from Taylor R Campbell.


Revision tags: yamt-pf42-base4
# 1.115 10-Jun-2008 ad

There can be existing waiters on a socket's condition variables when we
change socket::so_lock, and they rely on the old lock to synchronize.
Wake them up whenever we change so_lock so they can restart their waits.


Revision tags: yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.114 28-Apr-2008 martin

branches: 1.114.2; 1.114.4;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.113 27-Apr-2008 ad

branches: 1.113.2;
Add a comment.


# 1.112 24-Apr-2008 ad

Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.111 20-Apr-2008 mlelstv

When unp_internalize fails (due to the sanity check or an out-of-memory
condition), it leaves the control message with file descriptors. Calling
unp_dispose() will interpret the message as containing file pointers
and crash the system.
This change removes unp_dispose() from this failure path and avoids
using goto to jump into switch statements...
The previous workaround to ignore such messages in unp_scan() is removed.


# 1.110 19-Apr-2008 mjf

If cm->cmsg_len is not valid for unp_internalize do not use it to work out
where the data is in unp_scan.

Fixes PR/38391


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.109 28-Mar-2008 ad

branches: 1.109.2;
Prevent overlapping calls to bind() and/or connect() on a Unix socket.


Revision tags: ad-socklock-base1
# 1.108 24-Mar-2008 yamt

merge yamt-lazymbuf branch.


Revision tags: yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.107 21-Mar-2008 rmind

unp_gc: unlock filelist_lock in a case of restart.


# 1.106 21-Mar-2008 ad

Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.105 25-Jan-2008 ad

branches: 1.105.6;
Remove VOP_LEASE. Discussed on tech-kern.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.104 05-Jan-2008 dsl

Use FILE_LOCK() and FILE_UNLOCK()


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base jmcneill-pm-base
# 1.103 08-Dec-2007 pooka

branches: 1.103.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.


Revision tags: vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase reinoud-bufcleanup-base
# 1.102 26-Nov-2007 pooka

branches: 1.102.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern


Revision tags: jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 vmlocking-base
# 1.101 08-Oct-2007 ad

branches: 1.101.4;
Merge file descriptor locking, cwdi locking and cross-call changes
from the vmlocking branch.


Revision tags: yamt-x86pmap-base2 yamt-x86pmap-base
# 1.100 19-Sep-2007 dyoung

branches: 1.100.2;
Make uipc_ctloutput() return ENOPROTOOPT instead of EINVAL when it
is passed a handle socket-option level that it does not care about.


Revision tags: nick-csl-alignment-base5
# 1.99 09-Aug-2007 he

branches: 1.99.2;
Add a new socket option for unix domain sockets: LOCAL_PEEREID, to make
it possible to get the pid, euid and egid of the process at the remote
end at the time it did bind() or connect().

Add a new libc function, getpeereid() to easily get at the euid and egid.
As a consequence, bump libc's minor number.

Document the LOCAL_PEEREID socket option in unix(4).

Based on contribution by Arne H. Juul, minor modifications by myself.


Revision tags: matt-mips64-base
# 1.98 03-Aug-2007 martin

branches: 1.98.2;
PR kern/32842:
do not leak file descriptors when sending a datagram with SCM_RIGHTS
fails. Patch from Gary Thorpe, based on changes in FreeBSD and work
from Christian Biere.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.97 22-Apr-2007 dsl

branches: 1.97.2; 1.97.6;
Change the way that emulations locate files within the emulation root to
avoid having to allocate space in the 'stackgap'
- which is very LWP unfriendly.
The additional code for non-emulation namei() is trivial, the reduction for
the emulations is massive.
The vnode for a processes emulation root is saved in the cwdi structure
during process exec.
If the emulation root the TRYEMULROOT flag are set, namei() will do an initial
search for absolute pathnames in the emulation root, if that fails it will
retry from the normal root.
".." at the emulation root will always go to the real root, even in the middle
of paths and when expanding symlinks.
Absolute symlinks found using absolute paths in the emulation root will be
relative to the emulation root (so /usr/lib/xxx.so -> /lib/xxx.so links
inside the emulation root don't need changing).
If the root of the emulation would be returned (for an emulation lookup), then
the real root is returned instead (matching the behaviour of emul_lookup,
but being a cheap comparison here) so that programs that scan "../.."
looking for the root dircetory don't loop forever.
The target for symbolic links is no longer mangled (it used to get the
CHECK_ALT_xxx() treatment, so could get /emul/xxx prepended).
CHECK_ALT_xxx() are no more. Most of the change is deleting them, and adding
TRYEMULROOT to the flags to NDINIT().
A lot of the emulation system call stubs could now be deleted.


Revision tags: thorpej-atomic-base
# 1.96 03-Apr-2007 hannken

Remove calls to now obsolete vn_start_write() and vn_finished_write().


# 1.95 04-Mar-2007 christos

branches: 1.95.2; 1.95.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.94 01-Nov-2006 cbiere

branches: 1.94.2; 1.94.4; 1.94.8;
Pointing one element past an array is fine, pointing before it isn't.


Revision tags: yamt-splraiseipl-base2 yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.93 03-Sep-2006 christos

branches: 1.93.2; 1.93.4;
use c99 initializers


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base8 yamt-pdpolicy-base7
# 1.92 23-Jul-2006 ad

Use the LWP cached credentials where sane.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.91 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.90 14-Apr-2006 christos

Coverity CID 1089: Add more KASSERTs to prevent NULL deref.


# 1.89 14-Apr-2006 christos

Coverity CID 1088: Add KASSERT to prevent NULL pointer deref.


# 1.88 13-Apr-2006 matt

Add a KASSERT to document a condition for the PRU_ABORT case.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.87 01-Mar-2006 christos

branches: 1.87.2; 1.87.4; 1.87.6;
PR/32856: Christian Biere: Don't panic if you send a control message with
SCM_RIGHTS on an unconnected stream socket.


# 1.86 11-Dec-2005 christos

branches: 1.86.2; 1.86.4; 1.86.6;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base ktrace-lwp-base
# 1.85 11-Nov-2005 simonb

Call nanotime() directly, instead of doing the
microtime()/TIMEVAL_TO_TIMESPEC() dance.


Revision tags: yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.84 30-Aug-2005 jmmv

Honor the user's umask while creating local sockets. Several other systems
do already this (such as FreeBSD, OpenBSD and Linux), so it will improve
portability of some third-party programs. No objections in tech-kern@.


# 1.83 16-Jun-2005 yamt

branches: 1.83.2;
uipc_usrreq: plug mbuf leak.


# 1.82 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


# 1.81 07-May-2005 christos

PR/30154: YAMAMOTO Takashi: tcp_close locking botch
chgsbsize() as mentioned in the PR can be called from an interrupt context
via tcp_close(). Avoid calling uid_find() in chgsbsize().
- Instead of storing so_uid in struct socketvar, store *so_uidinfo
- Add a simple lock to struct uidinfo.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.80 26-Feb-2005 perry

branches: 1.80.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.79 03-Sep-2004 darrenr

branches: 1.79.4; 1.79.6;
add a per-socket counter for dropped UDP packets when the internal buffers
are full.


# 1.78 22-May-2004 jonathan

Eliminate several uses of `curproc' from the socket-layer code and from NFS.

Add a new explicit `struct proc *p' argument to socreate(), sosend().
Use that argument instead of curproc. Follow-on changes to pass that
argument to socreate(), sosend(), and (*so->so_send)() calls.
These changes reviewed and independently recoded by Matt Thomas.

Changes to soreceive() and (*dom->dom_exernalize() from Matt Thomas:
pass soreceive()'s struct uio* uio->uio_procp to unp_externalize().
Eliminate curproc from unp_externalize. Also, now soreceive() uses
its uio->uio_procp value, pass that same value downward to
((pr->pru_usrreq)() calls for consistency, instead of (struct proc * )0.

Similar changes in sys/nfs to eliminate (most) uses of curproc,
either via the req-> r_procp field of a struct nfsreq *req argument,
or by passing down new explicit struct proc * arguments.

Reviewed by: Matt Thomas, posted to tech-kern.
NB: The (*pr->pru_usrreq)() change should be tested on more (all!) protocols.


# 1.77 18-Apr-2004 matt

Constify sun_noname.


# 1.76 18-Apr-2004 matt

ANSI'fy.


# 1.75 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.74 23-Mar-2004 junyoung

branches: 1.74.2; 1.74.4;
Nuke __P().


# 1.73 29-Dec-2003 martin

Avoid using m_clget() on a mbuf already in use, especially when we
need the data in the mbuf later and m_clget() changes some fields
overlaid to regular mbuf data. Instead, rearange code a bit, create
data into a new allocated buffer and and use MEXTADD to attach it to
the mbuf, if the mbuf internal space is not sufficient.

This fixes a crash on sparc64 (and probably all other archs where
sizeof(int) != sizeof(struct file *)) when running
regress/sys/kern/unfdpass.

Idea for solution from Matt Thomas, with additional input from YAMAMOTO
Takashi.


# 1.72 29-Nov-2003 matt

Restore a change that made AF_LOCAL sockets block on connect(2) until
accepted. However, this time this behavor is not the default. Instead
it must enabled by using the LOCAL_CONNWAIT socket option on either the
connecting or accepting socket.


# 1.71 29-Nov-2003 perry

Revert a change that altered the semantics of AF_LOCAL sockets. Sadly
this made us API incompatible with other Unixes.


# 1.70 15-Oct-2003 hannken

Add the gating of system calls that cause modifications to the underlying
file system.
The function vfs_write_suspend stops all new write operations to a file
system, allows any file system modifying system calls already in progress
to complete, then sync's the file system to disk and returns. The
function vfs_write_resume allows the suspended write operations to
complete.

From FreeBSD with slight modifications.

Approved by: Frank van der Linden <fvdl@netbsd.org>


# 1.69 03-Sep-2003 matt

Fix typo.


# 1.68 03-Sep-2003 matt

Change the behavor of AF_LOCAL connect() to sleep until the server has
accepted the connection. This can prevent a client from overwhelming a
server.


# 1.67 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.66 24-Jul-2003 jdolecek

back rev 1.63 (the linux hack) off - no compat specific code
in generic code, please

we need to massage the passed linux cmsg anyway, linux uses different
alignment for CMSG_DATA on at least some architectures


# 1.65 23-Jul-2003 itojun

backout previous, there was a comment on LINUX_SOL_SOCKET=1


# 1.64 23-Jul-2003 itojun

#define LINUX_SOL_SOCKET 1, so that we can answer "what the hell is this 1?"
at ease.


# 1.63 23-Jul-2003 christos

From Todd Vierling: Accept level == 1 for linux compat.


# 1.62 29-Jun-2003 fvdl

branches: 1.62.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.61 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.60 10-Apr-2003 christos

RP/21088: Jesse Off: Return ENOBUFS instead of EINVAL when sbappend fails.


# 1.59 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


# 1.58 25-Feb-2003 pk

Fix a simple_lock() mismatch in unp_internalize().
We may need to merge the passes over the files contained in the message
as noted by enami tsugutomo on tech-smp.


# 1.57 23-Feb-2003 pk

Make updating a file's reference and use count MP-safe.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.56 25-Nov-2002 itojun

branches: 1.56.2;
no need for error check after MEXTMALLOC - jdolecek


# 1.55 25-Nov-2002 itojun

MEXTMALLOC() can fail even if M_WAITOK, if arg is too big for malloc().


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.54 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base gehenna-devsw-base eeh-devprop-base newlock-base ifpoll-base thorpej-mips-cache-base
# 1.53 12-Nov-2001 lukem

add RCSIDs


# 1.52 18-Oct-2001 thorpej

branches: 1.52.2;
Deprecate the "m_act" alias of "m_nextpkt" (m_act is a historical
name), and just use m_nextpkt everywhere.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.51 14-Jun-2001 thorpej

branches: 1.51.2;
Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.


# 1.50 07-Jun-2001 thorpej

Rework fdalloc() even further: split fdalloc() into fdalloc() and
fdexpand(). The former will return ENOSPC if there is not space
in the current filedesc table. The latter performs the expansion
of the filedesc table. This means that fdalloc() won't ever block,
and it gives callers an opportunity to clean up before the
potentially-blocking fdexpand() call.

Update all fdalloc() callers to deal with the need-to-fdexpand() case.

Rewrite unp_externalize() to use fdalloc() and fdexpand() in a
safe way, using an algorithm suggested by Bill Sommerfeld:
- Use a temporary array of integers to hold the new filedesc table
indexes. This allows us to repeat the loop if necessary.
- Loop through the array of file *'s, assigning them to filedesc table
slots. If fdalloc() indicates expansion is necessary, undo the
assignments we've done so far, expand, and retry the whole process.
- Once all file *'s have been assigned to slots, update the f_msgcount
and unp_rights counters.
- Right before we return, copy the temporary integer array to the message
buffer, and trim the length as before.
Note that once locking is added to the filedesc array, this entire
operation will be `atomic', in that the lock will be held while
file *'s are assigned to embryonic table slots, thus preventing anything
else from using them.


# 1.49 06-Jun-2001 thorpej

Change fdalloc() to return ERESTART if we had to reallocate the
descriptor array, which may have blocked. Change callers of
fdalloc() to restart whatever they\'re doing if this condition
happens. (XXX unp_externalize() needs some work, but that will
be tackled later.)

Change finishdup() to close the descriptor in the `new\' slot if
one exists, and change sys_dup2() accordingly.

Closes a race condition when using kernel-assisted user threads.

While here, garbage-collect UF_MAPPED -- it is not used anywhere.


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base thorpej_scsipi_base
# 1.48 05-Jun-2000 thorpej

branches: 1.48.2; 1.48.4;
Oops, missed a couple of places where CMSG_*() should be used. No
functional change in this case, but the code is now correct.


# 1.47 05-Jun-2000 thorpej

- Fix file descriptor passing AGAIN. This has apparently been broken
on LP64 systems (and probably the SPARC) since the __cmsg_alignbytes()
changes went in.
- Change file descriptor passing to use CMSG_DATA(), not (cm + 1). This
pretty much has to be done in order to make it work properly on LP64,
and considering that it's been broken this long...
- Use CMSG_SPACE() to determine the mbuf length needed for a given
control message, and CMSG_LEN() to stash in the cmsg_len member.


Revision tags: minoura-xpg4dl-base
# 1.46 30-Mar-2000 augustss

branches: 1.46.2;
Get rid of register declarations.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base chs-ubc2-base
# 1.45 17-Jun-1999 thorpej

branches: 1.45.2;
Um, hi, let's initialize pointers before we use them.


# 1.44 05-May-1999 thorpej

Add "use counting" to file entries. When closing a file, and it's reference
count is 0, wait for use count to drain before finishing the close.

This is necessary in order for multiple processes to safely share file
descriptor tables.


# 1.43 05-May-1999 thorpej

Fix alignment problem in the garbage-collection code path.


# 1.42 30-Apr-1999 thorpej

Break cdir/rdir/cmask info out of struct filedesc, and put it in a new
substructure, `cwdinfo'. Implement optional sharing of this substructure.

This is required for clone(2).


# 1.41 21-Apr-1999 mrg

revert previous. oops.


# 1.40 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: netbsd-1-4-base
# 1.39 22-Mar-1999 sommerfe

branches: 1.39.2;
Disallow descriptor-passing of descriptors which are open on
directories which aren't under the recipient's root.

Clean up of many error conditions involving descriptor passing, to
eliminate infinite loops, panics, premature garbage collection of
sockets, and descriptor leaks:
- Avoid letting unp_gc() see descriptors with a refcount of zero by
removing them from the socket's queue before releasing them.
- Avoid socket leak in PRU_ABORT (this will also gc descriptors queued
on a not-yet accepted socket when the accepting socket goes away).
- Put in block comment explaining how unp_gc() should work.
- Correctly manage unp_defer count so we don't get stuck in an infinite
loop with nothing to do.
- Don't tie MARK and DEFER bits so closely together.
- Mark descriptors queued on not-yet-accepted sockets as well.
- Don't call sorflush on non-socket, it doesn't work very well.
- Deal with discard of NULL file pointer.
- Hopefully cause GC to converge faster by only deferring sockets in
unp_mark().


# 1.38 21-Dec-1998 thorpej

In unp_internalize(), add a comment explaining why we must ALIGN() the
data after the cmsghdr when accessing internalized SCM_RIGHTS messages
(i.e. array of struct file *s). The historic interface does not align
the externalized SCM_RIGHTS messages (i.e. array of ints).


# 1.37 21-Dec-1998 thorpej

Fix a fencepost error in unp_scan() which caused a bad pointer deref on
the SPARC platform only (ILP32 but ALIGNBYTES of 7), due to a missing
ALIGN().


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.36 04-Aug-1998 perry

Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)


# 1.35 31-Jul-1998 perry

fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.


Revision tags: eeh-paddr_t-base
# 1.34 18-Jul-1998 lukem

branches: 1.34.2;
use AF_LOCAL instead of AF_UNIX


# 1.33 16-Jul-1998 thorpej

Back out previous, I botched something.


# 1.32 10-Jul-1998 thorpej

For SOCK_STREAM, provide the socket credentials to the accepter as soon as
the client connects.


# 1.31 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.30 07-Jan-1998 thorpej

Implement passing credentials as ancillary data on Unix domain sockets,
enabled with the LOCAL_CREDS socket option on the listener. Semantics are
similar to BSD/OS's:
- Creds are available with first data on SOCK_STREAM, and with every datagram
on SOCK_DGRAM.
- It is not possible to forge credentials.

Different in that:
- Different credential data structure (ours does not rely on the format
of internal kernel data structures, and does not pass the login name).
- We can pass creds and file descriptors at the same time (this does not
work in BSD/OS).

Luke Mewburn <lukem@netbsd.org> gets credit for inspiring me to implement
this. :-)


# 1.29 07-Jan-1998 thorpej

Fix passing of multiple file descriptors (was broken when code was made
64-bit safe).


Revision tags: netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.28 17-Oct-1997 christos

branches: 1.28.2;
PR/4280: Chris Jones: Sending more than one fd over AF_UNIX sockets causes
panic. Bug in the fd -> struct file * conversion...


Revision tags: thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.27 26-Jun-1997 thorpej

Several small changes to eliminate kludginess in dealing with unix domain
socket names:
- In unp_setsockaddr() and unp_setpeeraddr(), if the socket name can't
fit into a single mbuf, allocate enough external storage space to
hold it.
- In unp_bind() and unp_connect(), perform a similar operation, but allocate
one extra byte, and ensure that the pathname is nul-terminated.
Many thanks to enami tsugutomo <enami@cv.sony.co.jp> for the sanity
checking.


# 1.26 24-Jun-1997 thorpej

Eliminate use of dtom() in the handing of UNIX domain sockets. Add an
"unp_addrlen" member to the unpcb, and use it when copying the socket
name. This eliminates that last uses of dtom() in the system.


# 1.25 15-May-1997 kleink

When fstat(2)ing a file descriptor of a local communications domain socket,
fill the socket's creation time into the stat structure's st_[acm]time fields:
POSIX requires this behavior for pipe(2). N.B.: updating the st_[am]time fields
when reading/writing the pipe is neither required nor implemented, though.


# 1.24 10-Apr-1997 cgd

Internalize and externalize file descriptors being passed via local domain
socket control messages correctly, without assuming that sizeof(int) ==
sizeof(pointer). Fixes PR#3183.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.23 23-May-1996 mycroft

Oops. Add missing label.


# 1.22 23-May-1996 mycroft

We can only get a control mbuf for PRU_SEND or PRU_SENDOOB. Add diagnostic
code to panic in this case.


# 1.21 23-May-1996 mycroft

Make sure the control and data mbufs are freed in all cases.


# 1.20 23-May-1996 mycroft

Separate some code into separate functions.
Make unp_addr be a pointer to the sockaddr, not to the mbuf, as with raw
sockets.
Other minor cleanup.


# 1.19 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.18 09-Feb-1996 christos

branches: 1.18.4;
More proto fixes


# 1.17 04-Feb-1996 pk

unp_detach() return type botch.


# 1.16 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.15 17-Aug-1995 mycroft

so_pcb should be a void *.


# 1.14 16-Aug-1995 mycroft

Allocate PCBs with malloc(), more mgetclr(). Be more careful to free the
PCB after it's done with.


# 1.13 05-Apr-1995 mycroft

Add missing argument to closef().


# 1.12 13-Dec-1994 mycroft

LEASE_CHECK -> VOP_LEASE


# 1.11 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.10 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.9 08-Jun-1994 mycroft

Update to 4.4-Lite fs code.


# 1.8 04-May-1994 mycroft

Fix panic when closing a file descriptor on which access rights have been sent
but not received.


# 1.7 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.6 14-Sep-1993 mycroft

Fix from Noriyuki Soda <soda@sra.co.jp>:
recvmsg(2) always returns -1 with errno==EMSGSIZE, when trying
to pass file descriptors through UNIX domain socket.


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.5 27-Jun-1993 andrew

branches: 1.5.4;
ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.4 12-Jun-1993 andrew

Yuval Yarom's 'panic:closef: count < 0' fix to unp_discard().


# 1.3 22-May-1993 cgd

add include of select.h if necessary for protos, or delete if extraneous


# 1.2 18-May-1993 cgd

make kernel select interface be one-stop shopping & clean it all up.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.194 29-Jul-2019 maxv

Fix info leak: the padding after the header causes uninitialized heap
memory to be copied to userland in sys_recvmsg().


Revision tags: phil-wifi-20190609
# 1.193 03-Jun-2019 msaitoh

Fix typo in comment (s/seperate/separate/).


Revision tags: isaki-audio2-base
# 1.192 01-Mar-2019 pgoyette

Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.

NFCI intended.

Ride the earlier kernel bump - it;s getting crowded.


# 1.191 20-Feb-2019 pgoyette

compat70_ocreds_valid is not a pointer to a boolean, it is the boolean
itself which controls whether or not we recognize the OCRED options.

Should fix the panic identified in PR kern/53991 (awaiting confirmation
from submitter).


# 1.190 04-Feb-2019 mrg

add or adjust fallthru comments.


# 1.189 29-Jan-2019 pgoyette

Normalize all the compat hooks' names to the form

<subsystem>_<function>_<version>_hook

NFCI

XXX Note that although this introduces a change in the kernel-to-
XXX module interface, we are NOT bumping the kernel version number.
XXX We will bump the version number once the interface stabilizes.


# 1.188 27-Jan-2019 pgoyette

Merge the [pgoyette-compat] branch


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.187 08-Nov-2018 roy

Don't call soroverflow when we return the error to the sender.

Thanks to thorpej@ for a sanity check.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.186 11-May-2018 roy

branches: 1.186.2;
Increase the default size of some receive buffers from 8k to 16k.
This mitigates recent reports of socket overflow errors
and fixes PR bin/53247.


# 1.185 05-May-2018 christos

bump PIPSIZ from 4 to 8K like FreeBSD and provide the same sysctls


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.184 19-Mar-2018 roy

socket: report receive buffer overflows

Add soroverflow() which increments the overflow counter, sets so_error
to ENOBUFS and wakes the receive socket up.
Replace all code that manually increments this counter with soroverflow().
Add soroverflow() to raw_input().

This allows userland to detect route(4) overflows so it can re-sync
with the current state.


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.183 17-Feb-2018 christos

branches: 1.183.2;
fix LOCAL_PEEREID to not return the same info for both sides...
XXX: pullup-{7,8}


Revision tags: tls-maxphys-base-20171202
# 1.182 02-Dec-2017 mrg

include opt_compat_netbsd.h, so that eg COMPAT_70 will be set.


Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.181 31-Oct-2016 maxv

branches: 1.181.8;
Memory leak, found by Mootja. It is easily triggerable from userland.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.180 06-Apr-2016 roy

branches: 1.180.2;
Add sc_pid to sockcred so that SOCK_DGRAM and LOCAL_CREDS socket option
can learn the process id of the AF_LOCAL sender.
Add compat glue for old structure.


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606
# 1.179 02-May-2015 rtr

make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}


# 1.178 26-Apr-2015 rtr

remove pr_generic from struct pr_usrreqs and all implementations of
pr_generic in protocols.

bump to 7.99.13

approved by rmind@


# 1.177 24-Apr-2015 rtr

make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19


Revision tags: nick-nhusb-base-20150406
# 1.176 03-Apr-2015 rtr

* change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@


# 1.175 01-Mar-2015 christos

PR/39918: ITOH Yasufumi: Replace KASSERT with continue, since the file
descriptor can be closed since closef() does not pay attention to FDEFER.
XXX: Pullup-7


# 1.174 28-Feb-2015 rtr

Place opening brace to function at column 0, like in the rest of the file.


# 1.173 02-Feb-2015 christos

Handle LOCAL_PEERID for socketpair() connected sockets which connect through
connect2().
1. move the code that sets the peerid structure into connect1(). This
handles so2. The datagram code calls connect2 twice with flipped
so arguments so both sockets get set.
2. in connect2 copy the peerid structure from so2 to so, so that that
both stream sockets get set.


Revision tags: nick-nhusb-base
# 1.172 08-Oct-2014 taca

branches: 1.172.2;
Make behavior of getsockname(2) (and maybe getpeername(2)) as the same as
NetBSD 6.1_STABLE and other operating system (OS X 10.9.5).

* sa_len of sockaddr_un strucrure is always set to sizeof(sun_path).
* pathname stored in sun_path is alwasys '\0' terminated (except length
of sun_path is sizeof(sun_path)?).

Should be fix PR kern/49247, runtime problem of lmtp service of dovecot2 on
NetBSD current and NetBSD 7.0_BETA.


# 1.171 05-Sep-2014 matt

Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.


# 1.170 05-Sep-2014 matt

Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 09-Aug-2014 rtr

branches: 1.169.2;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@


# 1.168 08-Aug-2014 rtr

split PRU_RCVD function out of pr_generic() usrreq switches and put into
separate functions

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_RCVD with calls to
pr_rcvd()


# 1.167 05-Aug-2014 rtr

actually use the passed in struct lwp *l instead of curlwp in unp_connect()


# 1.166 05-Aug-2014 rtr

split PRU_SEND function out of pr_generic() usrreq switches and put into
separate functions

xxx_send(struct socket *, struct mbuf *, struct mbuf *,
struct mbuf *, struct lwp *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_SEND with calls to
pr_send()

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_send() PRUs

- l2cap_send() -> l2cap_send_pcb()
- sco_send() -> sco_send_pcb()
- rfcomm_send() -> rfcomm_send_pcb()

patch reviewed by rmind


# 1.165 05-Aug-2014 rtr

revert the removal of struct lwp * parameter from bind, listen and connect
user requests.

this should resolve the issue relating to nfs client hangs presented
recently by wiz on current-users@


# 1.164 31-Jul-2014 rtr

* remove declarations of unp_bind, unp_discard, unp_disconnect1, unp_drop,
unp_shutdown1, unp_internalize and unp_output functions from sys/un.h
and instead declare them as static in uipc_usrreq.c with prototype
declarations as necessary.

* remove struct lwp * parameter from unp_output() while here and just
use curlwp instead.

as discussed with rmind


# 1.163 31-Jul-2014 rtr

split PRU_DISCONNECT, PRU_SHUTDOWN and PRU_ABORT function out of
pr_generic() usrreq switches and put into separate functions

xxx_disconnect(struct socket *)
xxx_shutdown(struct socket *)
xxx_abort(struct socket *)

- always KASSERT(solocked(so)) even if not implemented
- replace calls to pr_generic() with req =
PRU_{DISCONNECT,SHUTDOWN,ABORT}
with calls to pr_{disconnect,shutdown,abort}() respectively

rename existing internal functions used to implement above functionality
to permit use of the names for xxx_{disconnect,shutdown,abort}().

- {l2cap,sco,rfcomm}_disconnect() ->
{l2cap,sco,rfcomm}_disconnect_pcb()
- {unp,rip,tcp}_disconnect() -> {unp,rip,tcp}_disconnect1()
- unp_shutdown() -> unp_shutdown1()

patch reviewed by rmind


# 1.162 30-Jul-2014 rtr

split PRU_CONNECT function out of pr_generic() usrreq switches and put
into seaparate functions

xxx_listen(struct socket *, struct mbuf *)

- always KASSERT(solocked(so)) and KASSERT(nam != NULL)
- replace calls to pr_generic() with req = PRU_CONNECT with
pr_connect()
- rename existin {l2cap,sco,rfcomm}_connect() to
{l2cap,sco,rfcomm}_connect_pcb() respectively to permit
naming consistency with other protocols functions.
- drop struct lwp * parameter from unp_connect() and at_pcbconnect()
and use curlwp instead where appropriate.

patch reviewed by rmind


# 1.161 24-Jul-2014 rtr

split PRU_BIND and PRU_LISTEN function out of pr_generic() usrreq
switches and put into separate functions
xxx_bind(struct socket *, struct mbuf *)
xxx_listen(struct socket *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_BIND with call to
pr_bind()

- replace calls to pr_generic() with req = PRU_LISTEN with call to
pr_listen()

- drop struct lwp * parameter from at_pcbsetaddr(), in_pcbbind() and
unp_bind() and always use curlwp.

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_{bind,listen}() PRUs

- l2cap_{bind,listen}() -> l2cap_{bind,listen}_pcb()
- sco_{bind,listen}() -> sco_{bind,listen}_pcb()
- rfcomm_{bind,listen}() -> rfcomm_{bind,listen}_pcb()

patch reviewed by rmind

welcome to netbsd 6.99.48


# 1.160 23-Jul-2014 rtr

split PRU_SENDOOB and PRU_RCVOOB function out of pr_generic() usrreq
switches and put into separate functions
xxx_sendoob(struct socket *, struct mbuf *, struct mbuf *)
xxx_recvoob(struct socket *, struct mbuf *, int)

- always KASSERT(solocked(so)) even if request is not implemented

- replace calls to pr_generic() with req = PRU_{SEND,RCV}OOB with
calls to pr_{send,recv}oob() respectively.

there is still some tweaking of m_freem(m) and m_freem(control) to come
for consistency. not performed with this commit for clarity.

reviewed by rmind


# 1.159 09-Jul-2014 rtr

* split PRU_ACCEPT function out of pr_generic() usrreq switches and put
into a separate function xxx_accept(struct socket *, struct mbuf *)

note: future cleanup will take place to remove struct mbuf parameter
type and replace it with a more appropriate type.

patch reviewed by rmind


# 1.158 09-Jul-2014 rtr

* split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47


# 1.157 07-Jul-2014 rtr

* sprinkle KASSERT(solocked(so)); in all pr_stat() functions.
* fix remaining inconsistent struct socket parameter names.


# 1.156 06-Jul-2014 rtr

* split PRU_SENSE functionality out of xxx_usrreq() switches and place into
separate xxx_stat(struct socket *, struct stat *) functions.
* replace calls using pr_generic with req == PRU_SENSE with pr_stat().

further change will follow that cleans up the pattern used to extract the
pcb and test for its presence.

reviewed by rmind


# 1.155 01-Jul-2014 rtr

fix parameter types in pr_ioctl, called xx_control() functions and remove
abuse of pointer to struct mbuf type.

param2 changed to u_long type and uses parameter name 'cmd' (ioctl command)
param3 changed to void * type and uses parameter name 'data'
param4 changed to struct ifnet * and uses parameter name 'ifp'
param5 has been removed (formerly struct lwp *) and uses of 'l' have been
replaced with curlwp from curproc(9).

callers have had (now unnecessary) casts to struct mbuf * removed, called
code has had (now unnecessary) casts to u_long, void * and struct ifnet *
respectively removed.

reviewed by rmind@


# 1.154 22-Jun-2014 rtr

* split PRU_CONTROL functionality out of xxx_userreq() switches and place
into separate xxx_ioctl() functions.
* place KASSERT(req != PRU_CONTROL) inside xxx_userreq() as it is now
inappropriate for req = PRU_CONTROL in xxx_userreq().
* replace calls to pr_generic() with req = PRU_CONTROL with pr_ioctl().
* remove & fixup references to PRU_CONTROL xxx_userreq() function comments.
* fix various comments references for xxx_userreq() that mentioned
PRU_CONTROL as xxx_userreq() no longer handles the request.

a further change will follow to fix parameter and naming inconsistencies
retained from original code.

Reviewed by rmind@


# 1.153 08-Jun-2014 christos

Handle race where the server closed the socket between us 'connecting' and
sending data.


# 1.152 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.151 18-May-2014 rmind

Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.150 23-Jan-2014 hannken

branches: 1.150.2;
Change vnode operations create, mknod, mkdir and symlink to return
the resulting vnode *vpp unlocked.

Discussed on tech-kern@

Welcome to 6.99.30


# 1.149 17-Jan-2014 hannken

Change vnode operations create, mknod, mkdir and symlink to keep the
directory node dvp locked on return.

Discussed on tech-kern@

Welcome to 6.99.29


# 1.148 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.147 25-Oct-2013 martin

Eliminat an unused variable and simplify the KASSERT which used to use it


# 1.146 08-Oct-2013 christos

Centralize the sockaddr_un allocation code. Set sun_len appropriately so
that the address length returned is correct, not always 106. Note that
we do things slightly differently than linux and explain why. Unit-tests
to come.


# 1.145 08-Oct-2013 christos

- Instead of having accept(2) return a zero-filled sockaddr for the case
where accept(2) was called on a unix socket that called connect(2) and
then close(2), before the connection was accepted, return the empty
sockaddr_un.
- Fix the length of the empty sockaddr_un socket so that it reflects reality.


# 1.144 29-Aug-2013 rmind

Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.


# 1.143 01-Aug-2013 drochner

In unp_externalize, don't do anything if an SCM_RIGHTS control message
was sent with zero file descriptors in it. Otherwise, a zero-length
temporary storage would be allocated which triggers panic on DIAGNOSTIC
kernels (but is harmless for release kernels).
reviewed by Taylor R Campbell


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.142 27-Jun-2013 christos

branches: 1.142.2;
use sbcreatecontrol1() and m_add() instead of open-coding everything, and
getting it slightly wrong.


Revision tags: agc-symver-base
# 1.141 14-Feb-2013 riastradh

Fix some screw cases in cmsg file descriptor passing.

- Don't leave garbage in the control buffer if allocating file
descriptors fails in unp_externalize.

- Scrub the space between CMSG_LEN and CMSG_SPACE to avoid kernel
memory disclosure in unp_externalize.

- Don't read past cmsg_len when closing file descriptors that
couldn't get delivered, in free_rights.

ok christos


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.140 06-Oct-2012 christos

Avoid crash dereferencing a NULL fp in fd_affix() in unp_externalize
caused by the sequence of passing two fd's with two sendmsg()'s,
then doing a read() and a recvmsg(). The read() calls dom_dispose()
which discards both messages in the mbuf, and sets the fp's in the
array to NULL. Linux dequeues only one message per read() so the
second recvmsg() gets the fd from the second message. This fix
just avoids the NULL pointer de-reference, making the second
recvmsg() to fail. It is dubious to pass fd's with stream sockets
and expect mixing read() and recvmsg() to work. Plus processing
one control message per read() changes the current semantics and
should be examined before applied. In addition there is a race between
dom_externalize() and dom_dispose(): what happens in a multi-threaded
network stack when one thread disposes where the other externalizes
the same array?

NB: Pullup to 6.


# 1.139 30-Jul-2012 christos

branches: 1.139.2;
remove infinite loop on error, extra parens on return.


# 1.138 30-Jul-2012 christos

simplify unp_externalize(), some from gimpy, some from me.


# 1.137 02-Jun-2012 martin

Stopgap fix for PR kern/46463: disallow passing of kqueue descriptors
via SCM_RIGHT anxiliary socket messages.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.136 26-Jun-2011 christos

branches: 1.136.2; 1.136.8;
* Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).

- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.

* Fix the system calls that take socklen_t arguments to actually do so.

* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).

* Bump libc version for the new syscalls.


# 1.135 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.134 29-May-2011 manu

branches: 1.134.2;
Add SOCK_SEQPACKET to PL_LOCAL sockets. Based on patch from Jesse Off,
submitted 8 years ago:
http://mail-index.netbsd.org/tech-kern/2003/04/14/0006.html


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.133 19-Nov-2010 dholland

branches: 1.133.2;
Introduce struct pathbuf. This is an abstraction to hold a pathname
and the metadata required to interpret it. Callers of namei must now
create a pathbuf and pass it to NDINIT (instead of a string and a
uio_seg), then destroy the pathbuf after the namei session is
complete.

Update all namei call sites accordingly. Add a pathbuf(9) man page and
update namei(9).

The pathbuf interface also now appears in a couple of related
additional places that were passing string/uio_seg pairs that were
later fed into NDINIT. Update other call sites accordingly.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3
# 1.132 21-Oct-2010 yamt

unp_connect: fix an assertion


# 1.131 21-Oct-2010 yamt

unp_connect2: fix a comment.


Revision tags: yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.130 24-Jun-2010 hannken

Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.129 09-Feb-2010 wiz

branches: 1.129.2;
Fix typo in comment.


Revision tags: uebayasi-xip-base
# 1.128 08-Jan-2010 pooka

branches: 1.128.2;
The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change


Revision tags: matt-premerge-20091211 yamt-nfs-mp-base8 jym-xensuspend-nbase
# 1.127 26-Aug-2009 bouyer

In uipc_usrreq(PRU_ACCEPT), grab the unp_streamlock before unp_setpeerlocks().
This fixes a race where, for a short period of time, so->so_lock and
so2->so_lock are not sync. This makes solocked2() and solocked()
unreliable and cause DIAGNOSTIC kernel panics. This also fixes a possible
panic in unp_setaddr() which expects the socket locked.
Should fix kern/38968, fix proposed in
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005863.html


Revision tags: yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.126 24-May-2009 ad

More changes to improve kern_descrip.c.

- Avoid atomics in more places.
- Remove the per-descriptor mutex, and just use filedesc_t::fd_lock.
It was only being used to synchronize close, and in any case we needed
to take fd_lock to free the descriptor slot.
- Optimize certain paths for the <NDFDFILE case.
- Sprinkle more comments and assertions.
- Cache more stuff in filedesc_t.
- Fix numerous minor bugs spotted along the way.
- Restructure how the open files array is maintained, for clarity and so
that we can eliminate the membar_consumer() call in fd_getfile(). This is
mostly syntactic sugar; the main functional change is that fd_nfiles now
lives alongside the open file array.

Some measurements with libmicro:

- simple file syscalls are like close() are between 1 to 10% faster.
- some nice improvements, e.g. poll(1000) which is ~50% faster.


Revision tags: yamt-nfs-mp-base4 yamt-nfs-mp-base3 jym-xensuspend-base
# 1.125 04-May-2009 yamt

tweak some assertions on so_head to make them more meaningful.


Revision tags: nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.124 09-Apr-2009 yamt

0 -> NULL


# 1.123 09-Apr-2009 yamt

remove an unnecessary cast.


# 1.122 09-Apr-2009 yamt

0 -> NULL where appropriate


# 1.121 11-Mar-2009 mrg

completely rework the way that orphaned sockets that are being fdpassed
via SCM_RIGHTS messages are dealt with:

1. unp_gc: make this a kthread.

2. unp_detach: go not call unp_gc directly. instead, wake up unp_gc kthread.

3. unp_scan: do not close files here. instead, put them on a global list
for unp_gc to close, along with a per-file "deferred close count". if
file is already enqueued for close, just increment deferred close count.
this eliminates the recursive calls.

3. unp_gc: scan files on global deferred close list. close each file N
times, as specified by deferred close count in file. continue processing
list until it becomes empty (closing may cause additional files to be
queued for close).

4. unp_gc: add additional bit to mark files we are scanning. set during
initial scan of global file list that currently clears FMARK/FDEFER.
during later scans, never examine / garbage collect descriptors that
we have not marked during the earlier scan. do not proceed with this
initial scan until all deferred closes have been processed. be careful
with locking to ensure no races are introduced between deferred close
and file scan.

5. unp_gc: use dummy file_t to mark position in list when scanning. allow
us to drop filelist_lock. in turn allows us to eliminate kmem_alloc()
and safely close files, etc.

6. prohibit transfer of descriptors within SCM_RIGHTS messages if
(num_files_in_transit > maxfiles / unp_rights_ratio)

7. fd_allocfile: ensure recycled filse don't get scanned.


this is 97% work done by andrew doran, with a couple of minor bug fixes
and a lot of testing by yours truly.


Revision tags: nick-hppapmap-base2
# 1.120 08-Feb-2009 pooka

branches: 1.120.2;
Don't try to fd_putfile() descriptors we didn't manage to fd_getfile().

Fixes local DoS panic described in kern/40570.


Revision tags: netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.119 11-Oct-2008 pooka

branches: 1.119.2; 1.119.4;
Move uidinfo to its own module in kern_uidinfo.c and include in rump.
No functional change to uidinfo.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.118 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.117 20-Jun-2008 christos

branches: 1.117.2;
Also enforce that cm->cmsg_len >= CMSG_ALIGN(sizeof cmsghdr), from
Michael van Elst


# 1.116 20-Jun-2008 christos

Don't require cm->cmsg_len == control->m_len, just that the cm->cmsg_len
<= control->m_len, like FreeBSD does. Idea from Taylor R Campbell.


Revision tags: yamt-pf42-base4
# 1.115 10-Jun-2008 ad

There can be existing waiters on a socket's condition variables when we
change socket::so_lock, and they rely on the old lock to synchronize.
Wake them up whenever we change so_lock so they can restart their waits.


Revision tags: yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.114 28-Apr-2008 martin

branches: 1.114.2; 1.114.4;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.113 27-Apr-2008 ad

branches: 1.113.2;
Add a comment.


# 1.112 24-Apr-2008 ad

Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.111 20-Apr-2008 mlelstv

When unp_internalize fails (due to the sanity check or an out-of-memory
condition), it leaves the control message with file descriptors. Calling
unp_dispose() will interpret the message as containing file pointers
and crash the system.
This change removes unp_dispose() from this failure path and avoids
using goto to jump into switch statements...
The previous workaround to ignore such messages in unp_scan() is removed.


# 1.110 19-Apr-2008 mjf

If cm->cmsg_len is not valid for unp_internalize do not use it to work out
where the data is in unp_scan.

Fixes PR/38391


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.109 28-Mar-2008 ad

branches: 1.109.2;
Prevent overlapping calls to bind() and/or connect() on a Unix socket.


Revision tags: ad-socklock-base1
# 1.108 24-Mar-2008 yamt

merge yamt-lazymbuf branch.


Revision tags: yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.107 21-Mar-2008 rmind

unp_gc: unlock filelist_lock in a case of restart.


# 1.106 21-Mar-2008 ad

Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.105 25-Jan-2008 ad

branches: 1.105.6;
Remove VOP_LEASE. Discussed on tech-kern.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.104 05-Jan-2008 dsl

Use FILE_LOCK() and FILE_UNLOCK()


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base jmcneill-pm-base
# 1.103 08-Dec-2007 pooka

branches: 1.103.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.


Revision tags: vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase reinoud-bufcleanup-base
# 1.102 26-Nov-2007 pooka

branches: 1.102.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern


Revision tags: jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 vmlocking-base
# 1.101 08-Oct-2007 ad

branches: 1.101.4;
Merge file descriptor locking, cwdi locking and cross-call changes
from the vmlocking branch.


Revision tags: yamt-x86pmap-base2 yamt-x86pmap-base
# 1.100 19-Sep-2007 dyoung

branches: 1.100.2;
Make uipc_ctloutput() return ENOPROTOOPT instead of EINVAL when it
is passed a handle socket-option level that it does not care about.


Revision tags: nick-csl-alignment-base5
# 1.99 09-Aug-2007 he

branches: 1.99.2;
Add a new socket option for unix domain sockets: LOCAL_PEEREID, to make
it possible to get the pid, euid and egid of the process at the remote
end at the time it did bind() or connect().

Add a new libc function, getpeereid() to easily get at the euid and egid.
As a consequence, bump libc's minor number.

Document the LOCAL_PEEREID socket option in unix(4).

Based on contribution by Arne H. Juul, minor modifications by myself.


Revision tags: matt-mips64-base
# 1.98 03-Aug-2007 martin

branches: 1.98.2;
PR kern/32842:
do not leak file descriptors when sending a datagram with SCM_RIGHTS
fails. Patch from Gary Thorpe, based on changes in FreeBSD and work
from Christian Biere.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.97 22-Apr-2007 dsl

branches: 1.97.2; 1.97.6;
Change the way that emulations locate files within the emulation root to
avoid having to allocate space in the 'stackgap'
- which is very LWP unfriendly.
The additional code for non-emulation namei() is trivial, the reduction for
the emulations is massive.
The vnode for a processes emulation root is saved in the cwdi structure
during process exec.
If the emulation root the TRYEMULROOT flag are set, namei() will do an initial
search for absolute pathnames in the emulation root, if that fails it will
retry from the normal root.
".." at the emulation root will always go to the real root, even in the middle
of paths and when expanding symlinks.
Absolute symlinks found using absolute paths in the emulation root will be
relative to the emulation root (so /usr/lib/xxx.so -> /lib/xxx.so links
inside the emulation root don't need changing).
If the root of the emulation would be returned (for an emulation lookup), then
the real root is returned instead (matching the behaviour of emul_lookup,
but being a cheap comparison here) so that programs that scan "../.."
looking for the root dircetory don't loop forever.
The target for symbolic links is no longer mangled (it used to get the
CHECK_ALT_xxx() treatment, so could get /emul/xxx prepended).
CHECK_ALT_xxx() are no more. Most of the change is deleting them, and adding
TRYEMULROOT to the flags to NDINIT().
A lot of the emulation system call stubs could now be deleted.


Revision tags: thorpej-atomic-base
# 1.96 03-Apr-2007 hannken

Remove calls to now obsolete vn_start_write() and vn_finished_write().


# 1.95 04-Mar-2007 christos

branches: 1.95.2; 1.95.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.94 01-Nov-2006 cbiere

branches: 1.94.2; 1.94.4; 1.94.8;
Pointing one element past an array is fine, pointing before it isn't.


Revision tags: yamt-splraiseipl-base2 yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.93 03-Sep-2006 christos

branches: 1.93.2; 1.93.4;
use c99 initializers


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base8 yamt-pdpolicy-base7
# 1.92 23-Jul-2006 ad

Use the LWP cached credentials where sane.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.91 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.90 14-Apr-2006 christos

Coverity CID 1089: Add more KASSERTs to prevent NULL deref.


# 1.89 14-Apr-2006 christos

Coverity CID 1088: Add KASSERT to prevent NULL pointer deref.


# 1.88 13-Apr-2006 matt

Add a KASSERT to document a condition for the PRU_ABORT case.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.87 01-Mar-2006 christos

branches: 1.87.2; 1.87.4; 1.87.6;
PR/32856: Christian Biere: Don't panic if you send a control message with
SCM_RIGHTS on an unconnected stream socket.


# 1.86 11-Dec-2005 christos

branches: 1.86.2; 1.86.4; 1.86.6;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base ktrace-lwp-base
# 1.85 11-Nov-2005 simonb

Call nanotime() directly, instead of doing the
microtime()/TIMEVAL_TO_TIMESPEC() dance.


Revision tags: yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.84 30-Aug-2005 jmmv

Honor the user's umask while creating local sockets. Several other systems
do already this (such as FreeBSD, OpenBSD and Linux), so it will improve
portability of some third-party programs. No objections in tech-kern@.


# 1.83 16-Jun-2005 yamt

branches: 1.83.2;
uipc_usrreq: plug mbuf leak.


# 1.82 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


# 1.81 07-May-2005 christos

PR/30154: YAMAMOTO Takashi: tcp_close locking botch
chgsbsize() as mentioned in the PR can be called from an interrupt context
via tcp_close(). Avoid calling uid_find() in chgsbsize().
- Instead of storing so_uid in struct socketvar, store *so_uidinfo
- Add a simple lock to struct uidinfo.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.80 26-Feb-2005 perry

branches: 1.80.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.79 03-Sep-2004 darrenr

branches: 1.79.4; 1.79.6;
add a per-socket counter for dropped UDP packets when the internal buffers
are full.


# 1.78 22-May-2004 jonathan

Eliminate several uses of `curproc' from the socket-layer code and from NFS.

Add a new explicit `struct proc *p' argument to socreate(), sosend().
Use that argument instead of curproc. Follow-on changes to pass that
argument to socreate(), sosend(), and (*so->so_send)() calls.
These changes reviewed and independently recoded by Matt Thomas.

Changes to soreceive() and (*dom->dom_exernalize() from Matt Thomas:
pass soreceive()'s struct uio* uio->uio_procp to unp_externalize().
Eliminate curproc from unp_externalize. Also, now soreceive() uses
its uio->uio_procp value, pass that same value downward to
((pr->pru_usrreq)() calls for consistency, instead of (struct proc * )0.

Similar changes in sys/nfs to eliminate (most) uses of curproc,
either via the req-> r_procp field of a struct nfsreq *req argument,
or by passing down new explicit struct proc * arguments.

Reviewed by: Matt Thomas, posted to tech-kern.
NB: The (*pr->pru_usrreq)() change should be tested on more (all!) protocols.


# 1.77 18-Apr-2004 matt

Constify sun_noname.


# 1.76 18-Apr-2004 matt

ANSI'fy.


# 1.75 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.74 23-Mar-2004 junyoung

branches: 1.74.2; 1.74.4;
Nuke __P().


# 1.73 29-Dec-2003 martin

Avoid using m_clget() on a mbuf already in use, especially when we
need the data in the mbuf later and m_clget() changes some fields
overlaid to regular mbuf data. Instead, rearange code a bit, create
data into a new allocated buffer and and use MEXTADD to attach it to
the mbuf, if the mbuf internal space is not sufficient.

This fixes a crash on sparc64 (and probably all other archs where
sizeof(int) != sizeof(struct file *)) when running
regress/sys/kern/unfdpass.

Idea for solution from Matt Thomas, with additional input from YAMAMOTO
Takashi.


# 1.72 29-Nov-2003 matt

Restore a change that made AF_LOCAL sockets block on connect(2) until
accepted. However, this time this behavor is not the default. Instead
it must enabled by using the LOCAL_CONNWAIT socket option on either the
connecting or accepting socket.


# 1.71 29-Nov-2003 perry

Revert a change that altered the semantics of AF_LOCAL sockets. Sadly
this made us API incompatible with other Unixes.


# 1.70 15-Oct-2003 hannken

Add the gating of system calls that cause modifications to the underlying
file system.
The function vfs_write_suspend stops all new write operations to a file
system, allows any file system modifying system calls already in progress
to complete, then sync's the file system to disk and returns. The
function vfs_write_resume allows the suspended write operations to
complete.

From FreeBSD with slight modifications.

Approved by: Frank van der Linden <fvdl@netbsd.org>


# 1.69 03-Sep-2003 matt

Fix typo.


# 1.68 03-Sep-2003 matt

Change the behavor of AF_LOCAL connect() to sleep until the server has
accepted the connection. This can prevent a client from overwhelming a
server.


# 1.67 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.66 24-Jul-2003 jdolecek

back rev 1.63 (the linux hack) off - no compat specific code
in generic code, please

we need to massage the passed linux cmsg anyway, linux uses different
alignment for CMSG_DATA on at least some architectures


# 1.65 23-Jul-2003 itojun

backout previous, there was a comment on LINUX_SOL_SOCKET=1


# 1.64 23-Jul-2003 itojun

#define LINUX_SOL_SOCKET 1, so that we can answer "what the hell is this 1?"
at ease.


# 1.63 23-Jul-2003 christos

From Todd Vierling: Accept level == 1 for linux compat.


# 1.62 29-Jun-2003 fvdl

branches: 1.62.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.61 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.60 10-Apr-2003 christos

RP/21088: Jesse Off: Return ENOBUFS instead of EINVAL when sbappend fails.


# 1.59 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


# 1.58 25-Feb-2003 pk

Fix a simple_lock() mismatch in unp_internalize().
We may need to merge the passes over the files contained in the message
as noted by enami tsugutomo on tech-smp.


# 1.57 23-Feb-2003 pk

Make updating a file's reference and use count MP-safe.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.56 25-Nov-2002 itojun

branches: 1.56.2;
no need for error check after MEXTMALLOC - jdolecek


# 1.55 25-Nov-2002 itojun

MEXTMALLOC() can fail even if M_WAITOK, if arg is too big for malloc().


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.54 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base gehenna-devsw-base eeh-devprop-base newlock-base ifpoll-base thorpej-mips-cache-base
# 1.53 12-Nov-2001 lukem

add RCSIDs


# 1.52 18-Oct-2001 thorpej

branches: 1.52.2;
Deprecate the "m_act" alias of "m_nextpkt" (m_act is a historical
name), and just use m_nextpkt everywhere.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.51 14-Jun-2001 thorpej

branches: 1.51.2;
Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.


# 1.50 07-Jun-2001 thorpej

Rework fdalloc() even further: split fdalloc() into fdalloc() and
fdexpand(). The former will return ENOSPC if there is not space
in the current filedesc table. The latter performs the expansion
of the filedesc table. This means that fdalloc() won't ever block,
and it gives callers an opportunity to clean up before the
potentially-blocking fdexpand() call.

Update all fdalloc() callers to deal with the need-to-fdexpand() case.

Rewrite unp_externalize() to use fdalloc() and fdexpand() in a
safe way, using an algorithm suggested by Bill Sommerfeld:
- Use a temporary array of integers to hold the new filedesc table
indexes. This allows us to repeat the loop if necessary.
- Loop through the array of file *'s, assigning them to filedesc table
slots. If fdalloc() indicates expansion is necessary, undo the
assignments we've done so far, expand, and retry the whole process.
- Once all file *'s have been assigned to slots, update the f_msgcount
and unp_rights counters.
- Right before we return, copy the temporary integer array to the message
buffer, and trim the length as before.
Note that once locking is added to the filedesc array, this entire
operation will be `atomic', in that the lock will be held while
file *'s are assigned to embryonic table slots, thus preventing anything
else from using them.


# 1.49 06-Jun-2001 thorpej

Change fdalloc() to return ERESTART if we had to reallocate the
descriptor array, which may have blocked. Change callers of
fdalloc() to restart whatever they\'re doing if this condition
happens. (XXX unp_externalize() needs some work, but that will
be tackled later.)

Change finishdup() to close the descriptor in the `new\' slot if
one exists, and change sys_dup2() accordingly.

Closes a race condition when using kernel-assisted user threads.

While here, garbage-collect UF_MAPPED -- it is not used anywhere.


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base thorpej_scsipi_base
# 1.48 05-Jun-2000 thorpej

branches: 1.48.2; 1.48.4;
Oops, missed a couple of places where CMSG_*() should be used. No
functional change in this case, but the code is now correct.


# 1.47 05-Jun-2000 thorpej

- Fix file descriptor passing AGAIN. This has apparently been broken
on LP64 systems (and probably the SPARC) since the __cmsg_alignbytes()
changes went in.
- Change file descriptor passing to use CMSG_DATA(), not (cm + 1). This
pretty much has to be done in order to make it work properly on LP64,
and considering that it's been broken this long...
- Use CMSG_SPACE() to determine the mbuf length needed for a given
control message, and CMSG_LEN() to stash in the cmsg_len member.


Revision tags: minoura-xpg4dl-base
# 1.46 30-Mar-2000 augustss

branches: 1.46.2;
Get rid of register declarations.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base chs-ubc2-base
# 1.45 17-Jun-1999 thorpej

branches: 1.45.2;
Um, hi, let's initialize pointers before we use them.


# 1.44 05-May-1999 thorpej

Add "use counting" to file entries. When closing a file, and it's reference
count is 0, wait for use count to drain before finishing the close.

This is necessary in order for multiple processes to safely share file
descriptor tables.


# 1.43 05-May-1999 thorpej

Fix alignment problem in the garbage-collection code path.


# 1.42 30-Apr-1999 thorpej

Break cdir/rdir/cmask info out of struct filedesc, and put it in a new
substructure, `cwdinfo'. Implement optional sharing of this substructure.

This is required for clone(2).


# 1.41 21-Apr-1999 mrg

revert previous. oops.


# 1.40 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: netbsd-1-4-base
# 1.39 22-Mar-1999 sommerfe

branches: 1.39.2;
Disallow descriptor-passing of descriptors which are open on
directories which aren't under the recipient's root.

Clean up of many error conditions involving descriptor passing, to
eliminate infinite loops, panics, premature garbage collection of
sockets, and descriptor leaks:
- Avoid letting unp_gc() see descriptors with a refcount of zero by
removing them from the socket's queue before releasing them.
- Avoid socket leak in PRU_ABORT (this will also gc descriptors queued
on a not-yet accepted socket when the accepting socket goes away).
- Put in block comment explaining how unp_gc() should work.
- Correctly manage unp_defer count so we don't get stuck in an infinite
loop with nothing to do.
- Don't tie MARK and DEFER bits so closely together.
- Mark descriptors queued on not-yet-accepted sockets as well.
- Don't call sorflush on non-socket, it doesn't work very well.
- Deal with discard of NULL file pointer.
- Hopefully cause GC to converge faster by only deferring sockets in
unp_mark().


# 1.38 21-Dec-1998 thorpej

In unp_internalize(), add a comment explaining why we must ALIGN() the
data after the cmsghdr when accessing internalized SCM_RIGHTS messages
(i.e. array of struct file *s). The historic interface does not align
the externalized SCM_RIGHTS messages (i.e. array of ints).


# 1.37 21-Dec-1998 thorpej

Fix a fencepost error in unp_scan() which caused a bad pointer deref on
the SPARC platform only (ILP32 but ALIGNBYTES of 7), due to a missing
ALIGN().


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.36 04-Aug-1998 perry

Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)


# 1.35 31-Jul-1998 perry

fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.


Revision tags: eeh-paddr_t-base
# 1.34 18-Jul-1998 lukem

branches: 1.34.2;
use AF_LOCAL instead of AF_UNIX


# 1.33 16-Jul-1998 thorpej

Back out previous, I botched something.


# 1.32 10-Jul-1998 thorpej

For SOCK_STREAM, provide the socket credentials to the accepter as soon as
the client connects.


# 1.31 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.30 07-Jan-1998 thorpej

Implement passing credentials as ancillary data on Unix domain sockets,
enabled with the LOCAL_CREDS socket option on the listener. Semantics are
similar to BSD/OS's:
- Creds are available with first data on SOCK_STREAM, and with every datagram
on SOCK_DGRAM.
- It is not possible to forge credentials.

Different in that:
- Different credential data structure (ours does not rely on the format
of internal kernel data structures, and does not pass the login name).
- We can pass creds and file descriptors at the same time (this does not
work in BSD/OS).

Luke Mewburn <lukem@netbsd.org> gets credit for inspiring me to implement
this. :-)


# 1.29 07-Jan-1998 thorpej

Fix passing of multiple file descriptors (was broken when code was made
64-bit safe).


Revision tags: netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.28 17-Oct-1997 christos

branches: 1.28.2;
PR/4280: Chris Jones: Sending more than one fd over AF_UNIX sockets causes
panic. Bug in the fd -> struct file * conversion...


Revision tags: thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.27 26-Jun-1997 thorpej

Several small changes to eliminate kludginess in dealing with unix domain
socket names:
- In unp_setsockaddr() and unp_setpeeraddr(), if the socket name can't
fit into a single mbuf, allocate enough external storage space to
hold it.
- In unp_bind() and unp_connect(), perform a similar operation, but allocate
one extra byte, and ensure that the pathname is nul-terminated.
Many thanks to enami tsugutomo <enami@cv.sony.co.jp> for the sanity
checking.


# 1.26 24-Jun-1997 thorpej

Eliminate use of dtom() in the handing of UNIX domain sockets. Add an
"unp_addrlen" member to the unpcb, and use it when copying the socket
name. This eliminates that last uses of dtom() in the system.


# 1.25 15-May-1997 kleink

When fstat(2)ing a file descriptor of a local communications domain socket,
fill the socket's creation time into the stat structure's st_[acm]time fields:
POSIX requires this behavior for pipe(2). N.B.: updating the st_[am]time fields
when reading/writing the pipe is neither required nor implemented, though.


# 1.24 10-Apr-1997 cgd

Internalize and externalize file descriptors being passed via local domain
socket control messages correctly, without assuming that sizeof(int) ==
sizeof(pointer). Fixes PR#3183.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.23 23-May-1996 mycroft

Oops. Add missing label.


# 1.22 23-May-1996 mycroft

We can only get a control mbuf for PRU_SEND or PRU_SENDOOB. Add diagnostic
code to panic in this case.


# 1.21 23-May-1996 mycroft

Make sure the control and data mbufs are freed in all cases.


# 1.20 23-May-1996 mycroft

Separate some code into separate functions.
Make unp_addr be a pointer to the sockaddr, not to the mbuf, as with raw
sockets.
Other minor cleanup.


# 1.19 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.18 09-Feb-1996 christos

branches: 1.18.4;
More proto fixes


# 1.17 04-Feb-1996 pk

unp_detach() return type botch.


# 1.16 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.15 17-Aug-1995 mycroft

so_pcb should be a void *.


# 1.14 16-Aug-1995 mycroft

Allocate PCBs with malloc(), more mgetclr(). Be more careful to free the
PCB after it's done with.


# 1.13 05-Apr-1995 mycroft

Add missing argument to closef().


# 1.12 13-Dec-1994 mycroft

LEASE_CHECK -> VOP_LEASE


# 1.11 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.10 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.9 08-Jun-1994 mycroft

Update to 4.4-Lite fs code.


# 1.8 04-May-1994 mycroft

Fix panic when closing a file descriptor on which access rights have been sent
but not received.


# 1.7 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.6 14-Sep-1993 mycroft

Fix from Noriyuki Soda <soda@sra.co.jp>:
recvmsg(2) always returns -1 with errno==EMSGSIZE, when trying
to pass file descriptors through UNIX domain socket.


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.5 27-Jun-1993 andrew

branches: 1.5.4;
ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.4 12-Jun-1993 andrew

Yuval Yarom's 'panic:closef: count < 0' fix to unp_discard().


# 1.3 22-May-1993 cgd

add include of select.h if necessary for protos, or delete if extraneous


# 1.2 18-May-1993 cgd

make kernel select interface be one-stop shopping & clean it all up.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.193 03-Jun-2019 msaitoh

Fix typo in comment (s/seperate/separate/).


Revision tags: isaki-audio2-base
# 1.192 01-Mar-2019 pgoyette

Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.

NFCI intended.

Ride the earlier kernel bump - it;s getting crowded.


# 1.191 20-Feb-2019 pgoyette

compat70_ocreds_valid is not a pointer to a boolean, it is the boolean
itself which controls whether or not we recognize the OCRED options.

Should fix the panic identified in PR kern/53991 (awaiting confirmation
from submitter).


# 1.190 04-Feb-2019 mrg

add or adjust fallthru comments.


# 1.189 29-Jan-2019 pgoyette

Normalize all the compat hooks' names to the form

<subsystem>_<function>_<version>_hook

NFCI

XXX Note that although this introduces a change in the kernel-to-
XXX module interface, we are NOT bumping the kernel version number.
XXX We will bump the version number once the interface stabilizes.


# 1.188 27-Jan-2019 pgoyette

Merge the [pgoyette-compat] branch


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.187 08-Nov-2018 roy

Don't call soroverflow when we return the error to the sender.

Thanks to thorpej@ for a sanity check.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.186 11-May-2018 roy

Increase the default size of some receive buffers from 8k to 16k.
This mitigates recent reports of socket overflow errors
and fixes PR bin/53247.


# 1.185 05-May-2018 christos

bump PIPSIZ from 4 to 8K like FreeBSD and provide the same sysctls


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.184 19-Mar-2018 roy

socket: report receive buffer overflows

Add soroverflow() which increments the overflow counter, sets so_error
to ENOBUFS and wakes the receive socket up.
Replace all code that manually increments this counter with soroverflow().
Add soroverflow() to raw_input().

This allows userland to detect route(4) overflows so it can re-sync
with the current state.


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.183 17-Feb-2018 christos

branches: 1.183.2;
fix LOCAL_PEEREID to not return the same info for both sides...
XXX: pullup-{7,8}


Revision tags: tls-maxphys-base-20171202
# 1.182 02-Dec-2017 mrg

include opt_compat_netbsd.h, so that eg COMPAT_70 will be set.


Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.181 31-Oct-2016 maxv

branches: 1.181.8;
Memory leak, found by Mootja. It is easily triggerable from userland.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.180 06-Apr-2016 roy

branches: 1.180.2;
Add sc_pid to sockcred so that SOCK_DGRAM and LOCAL_CREDS socket option
can learn the process id of the AF_LOCAL sender.
Add compat glue for old structure.


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606
# 1.179 02-May-2015 rtr

make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}


# 1.178 26-Apr-2015 rtr

remove pr_generic from struct pr_usrreqs and all implementations of
pr_generic in protocols.

bump to 7.99.13

approved by rmind@


# 1.177 24-Apr-2015 rtr

make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19


Revision tags: nick-nhusb-base-20150406
# 1.176 03-Apr-2015 rtr

* change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@


# 1.175 01-Mar-2015 christos

PR/39918: ITOH Yasufumi: Replace KASSERT with continue, since the file
descriptor can be closed since closef() does not pay attention to FDEFER.
XXX: Pullup-7


# 1.174 28-Feb-2015 rtr

Place opening brace to function at column 0, like in the rest of the file.


# 1.173 02-Feb-2015 christos

Handle LOCAL_PEERID for socketpair() connected sockets which connect through
connect2().
1. move the code that sets the peerid structure into connect1(). This
handles so2. The datagram code calls connect2 twice with flipped
so arguments so both sockets get set.
2. in connect2 copy the peerid structure from so2 to so, so that that
both stream sockets get set.


Revision tags: nick-nhusb-base
# 1.172 08-Oct-2014 taca

branches: 1.172.2;
Make behavior of getsockname(2) (and maybe getpeername(2)) as the same as
NetBSD 6.1_STABLE and other operating system (OS X 10.9.5).

* sa_len of sockaddr_un strucrure is always set to sizeof(sun_path).
* pathname stored in sun_path is alwasys '\0' terminated (except length
of sun_path is sizeof(sun_path)?).

Should be fix PR kern/49247, runtime problem of lmtp service of dovecot2 on
NetBSD current and NetBSD 7.0_BETA.


# 1.171 05-Sep-2014 matt

Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.


# 1.170 05-Sep-2014 matt

Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 09-Aug-2014 rtr

branches: 1.169.2;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@


# 1.168 08-Aug-2014 rtr

split PRU_RCVD function out of pr_generic() usrreq switches and put into
separate functions

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_RCVD with calls to
pr_rcvd()


# 1.167 05-Aug-2014 rtr

actually use the passed in struct lwp *l instead of curlwp in unp_connect()


# 1.166 05-Aug-2014 rtr

split PRU_SEND function out of pr_generic() usrreq switches and put into
separate functions

xxx_send(struct socket *, struct mbuf *, struct mbuf *,
struct mbuf *, struct lwp *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_SEND with calls to
pr_send()

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_send() PRUs

- l2cap_send() -> l2cap_send_pcb()
- sco_send() -> sco_send_pcb()
- rfcomm_send() -> rfcomm_send_pcb()

patch reviewed by rmind


# 1.165 05-Aug-2014 rtr

revert the removal of struct lwp * parameter from bind, listen and connect
user requests.

this should resolve the issue relating to nfs client hangs presented
recently by wiz on current-users@


# 1.164 31-Jul-2014 rtr

* remove declarations of unp_bind, unp_discard, unp_disconnect1, unp_drop,
unp_shutdown1, unp_internalize and unp_output functions from sys/un.h
and instead declare them as static in uipc_usrreq.c with prototype
declarations as necessary.

* remove struct lwp * parameter from unp_output() while here and just
use curlwp instead.

as discussed with rmind


# 1.163 31-Jul-2014 rtr

split PRU_DISCONNECT, PRU_SHUTDOWN and PRU_ABORT function out of
pr_generic() usrreq switches and put into separate functions

xxx_disconnect(struct socket *)
xxx_shutdown(struct socket *)
xxx_abort(struct socket *)

- always KASSERT(solocked(so)) even if not implemented
- replace calls to pr_generic() with req =
PRU_{DISCONNECT,SHUTDOWN,ABORT}
with calls to pr_{disconnect,shutdown,abort}() respectively

rename existing internal functions used to implement above functionality
to permit use of the names for xxx_{disconnect,shutdown,abort}().

- {l2cap,sco,rfcomm}_disconnect() ->
{l2cap,sco,rfcomm}_disconnect_pcb()
- {unp,rip,tcp}_disconnect() -> {unp,rip,tcp}_disconnect1()
- unp_shutdown() -> unp_shutdown1()

patch reviewed by rmind


# 1.162 30-Jul-2014 rtr

split PRU_CONNECT function out of pr_generic() usrreq switches and put
into seaparate functions

xxx_listen(struct socket *, struct mbuf *)

- always KASSERT(solocked(so)) and KASSERT(nam != NULL)
- replace calls to pr_generic() with req = PRU_CONNECT with
pr_connect()
- rename existin {l2cap,sco,rfcomm}_connect() to
{l2cap,sco,rfcomm}_connect_pcb() respectively to permit
naming consistency with other protocols functions.
- drop struct lwp * parameter from unp_connect() and at_pcbconnect()
and use curlwp instead where appropriate.

patch reviewed by rmind


# 1.161 24-Jul-2014 rtr

split PRU_BIND and PRU_LISTEN function out of pr_generic() usrreq
switches and put into separate functions
xxx_bind(struct socket *, struct mbuf *)
xxx_listen(struct socket *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_BIND with call to
pr_bind()

- replace calls to pr_generic() with req = PRU_LISTEN with call to
pr_listen()

- drop struct lwp * parameter from at_pcbsetaddr(), in_pcbbind() and
unp_bind() and always use curlwp.

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_{bind,listen}() PRUs

- l2cap_{bind,listen}() -> l2cap_{bind,listen}_pcb()
- sco_{bind,listen}() -> sco_{bind,listen}_pcb()
- rfcomm_{bind,listen}() -> rfcomm_{bind,listen}_pcb()

patch reviewed by rmind

welcome to netbsd 6.99.48


# 1.160 23-Jul-2014 rtr

split PRU_SENDOOB and PRU_RCVOOB function out of pr_generic() usrreq
switches and put into separate functions
xxx_sendoob(struct socket *, struct mbuf *, struct mbuf *)
xxx_recvoob(struct socket *, struct mbuf *, int)

- always KASSERT(solocked(so)) even if request is not implemented

- replace calls to pr_generic() with req = PRU_{SEND,RCV}OOB with
calls to pr_{send,recv}oob() respectively.

there is still some tweaking of m_freem(m) and m_freem(control) to come
for consistency. not performed with this commit for clarity.

reviewed by rmind


# 1.159 09-Jul-2014 rtr

* split PRU_ACCEPT function out of pr_generic() usrreq switches and put
into a separate function xxx_accept(struct socket *, struct mbuf *)

note: future cleanup will take place to remove struct mbuf parameter
type and replace it with a more appropriate type.

patch reviewed by rmind


# 1.158 09-Jul-2014 rtr

* split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47


# 1.157 07-Jul-2014 rtr

* sprinkle KASSERT(solocked(so)); in all pr_stat() functions.
* fix remaining inconsistent struct socket parameter names.


# 1.156 06-Jul-2014 rtr

* split PRU_SENSE functionality out of xxx_usrreq() switches and place into
separate xxx_stat(struct socket *, struct stat *) functions.
* replace calls using pr_generic with req == PRU_SENSE with pr_stat().

further change will follow that cleans up the pattern used to extract the
pcb and test for its presence.

reviewed by rmind


# 1.155 01-Jul-2014 rtr

fix parameter types in pr_ioctl, called xx_control() functions and remove
abuse of pointer to struct mbuf type.

param2 changed to u_long type and uses parameter name 'cmd' (ioctl command)
param3 changed to void * type and uses parameter name 'data'
param4 changed to struct ifnet * and uses parameter name 'ifp'
param5 has been removed (formerly struct lwp *) and uses of 'l' have been
replaced with curlwp from curproc(9).

callers have had (now unnecessary) casts to struct mbuf * removed, called
code has had (now unnecessary) casts to u_long, void * and struct ifnet *
respectively removed.

reviewed by rmind@


# 1.154 22-Jun-2014 rtr

* split PRU_CONTROL functionality out of xxx_userreq() switches and place
into separate xxx_ioctl() functions.
* place KASSERT(req != PRU_CONTROL) inside xxx_userreq() as it is now
inappropriate for req = PRU_CONTROL in xxx_userreq().
* replace calls to pr_generic() with req = PRU_CONTROL with pr_ioctl().
* remove & fixup references to PRU_CONTROL xxx_userreq() function comments.
* fix various comments references for xxx_userreq() that mentioned
PRU_CONTROL as xxx_userreq() no longer handles the request.

a further change will follow to fix parameter and naming inconsistencies
retained from original code.

Reviewed by rmind@


# 1.153 08-Jun-2014 christos

Handle race where the server closed the socket between us 'connecting' and
sending data.


# 1.152 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.151 18-May-2014 rmind

Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.150 23-Jan-2014 hannken

branches: 1.150.2;
Change vnode operations create, mknod, mkdir and symlink to return
the resulting vnode *vpp unlocked.

Discussed on tech-kern@

Welcome to 6.99.30


# 1.149 17-Jan-2014 hannken

Change vnode operations create, mknod, mkdir and symlink to keep the
directory node dvp locked on return.

Discussed on tech-kern@

Welcome to 6.99.29


# 1.148 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.147 25-Oct-2013 martin

Eliminat an unused variable and simplify the KASSERT which used to use it


# 1.146 08-Oct-2013 christos

Centralize the sockaddr_un allocation code. Set sun_len appropriately so
that the address length returned is correct, not always 106. Note that
we do things slightly differently than linux and explain why. Unit-tests
to come.


# 1.145 08-Oct-2013 christos

- Instead of having accept(2) return a zero-filled sockaddr for the case
where accept(2) was called on a unix socket that called connect(2) and
then close(2), before the connection was accepted, return the empty
sockaddr_un.
- Fix the length of the empty sockaddr_un socket so that it reflects reality.


# 1.144 29-Aug-2013 rmind

Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.


# 1.143 01-Aug-2013 drochner

In unp_externalize, don't do anything if an SCM_RIGHTS control message
was sent with zero file descriptors in it. Otherwise, a zero-length
temporary storage would be allocated which triggers panic on DIAGNOSTIC
kernels (but is harmless for release kernels).
reviewed by Taylor R Campbell


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.142 27-Jun-2013 christos

branches: 1.142.2;
use sbcreatecontrol1() and m_add() instead of open-coding everything, and
getting it slightly wrong.


Revision tags: agc-symver-base
# 1.141 14-Feb-2013 riastradh

Fix some screw cases in cmsg file descriptor passing.

- Don't leave garbage in the control buffer if allocating file
descriptors fails in unp_externalize.

- Scrub the space between CMSG_LEN and CMSG_SPACE to avoid kernel
memory disclosure in unp_externalize.

- Don't read past cmsg_len when closing file descriptors that
couldn't get delivered, in free_rights.

ok christos


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.140 06-Oct-2012 christos

Avoid crash dereferencing a NULL fp in fd_affix() in unp_externalize
caused by the sequence of passing two fd's with two sendmsg()'s,
then doing a read() and a recvmsg(). The read() calls dom_dispose()
which discards both messages in the mbuf, and sets the fp's in the
array to NULL. Linux dequeues only one message per read() so the
second recvmsg() gets the fd from the second message. This fix
just avoids the NULL pointer de-reference, making the second
recvmsg() to fail. It is dubious to pass fd's with stream sockets
and expect mixing read() and recvmsg() to work. Plus processing
one control message per read() changes the current semantics and
should be examined before applied. In addition there is a race between
dom_externalize() and dom_dispose(): what happens in a multi-threaded
network stack when one thread disposes where the other externalizes
the same array?

NB: Pullup to 6.


# 1.139 30-Jul-2012 christos

branches: 1.139.2;
remove infinite loop on error, extra parens on return.


# 1.138 30-Jul-2012 christos

simplify unp_externalize(), some from gimpy, some from me.


# 1.137 02-Jun-2012 martin

Stopgap fix for PR kern/46463: disallow passing of kqueue descriptors
via SCM_RIGHT anxiliary socket messages.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.136 26-Jun-2011 christos

branches: 1.136.2; 1.136.8;
* Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).

- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.

* Fix the system calls that take socklen_t arguments to actually do so.

* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).

* Bump libc version for the new syscalls.


# 1.135 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.134 29-May-2011 manu

branches: 1.134.2;
Add SOCK_SEQPACKET to PL_LOCAL sockets. Based on patch from Jesse Off,
submitted 8 years ago:
http://mail-index.netbsd.org/tech-kern/2003/04/14/0006.html


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.133 19-Nov-2010 dholland

branches: 1.133.2;
Introduce struct pathbuf. This is an abstraction to hold a pathname
and the metadata required to interpret it. Callers of namei must now
create a pathbuf and pass it to NDINIT (instead of a string and a
uio_seg), then destroy the pathbuf after the namei session is
complete.

Update all namei call sites accordingly. Add a pathbuf(9) man page and
update namei(9).

The pathbuf interface also now appears in a couple of related
additional places that were passing string/uio_seg pairs that were
later fed into NDINIT. Update other call sites accordingly.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3
# 1.132 21-Oct-2010 yamt

unp_connect: fix an assertion


# 1.131 21-Oct-2010 yamt

unp_connect2: fix a comment.


Revision tags: yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.130 24-Jun-2010 hannken

Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.129 09-Feb-2010 wiz

branches: 1.129.2;
Fix typo in comment.


Revision tags: uebayasi-xip-base
# 1.128 08-Jan-2010 pooka

branches: 1.128.2;
The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change


Revision tags: matt-premerge-20091211 yamt-nfs-mp-base8 jym-xensuspend-nbase
# 1.127 26-Aug-2009 bouyer

In uipc_usrreq(PRU_ACCEPT), grab the unp_streamlock before unp_setpeerlocks().
This fixes a race where, for a short period of time, so->so_lock and
so2->so_lock are not sync. This makes solocked2() and solocked()
unreliable and cause DIAGNOSTIC kernel panics. This also fixes a possible
panic in unp_setaddr() which expects the socket locked.
Should fix kern/38968, fix proposed in
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005863.html


Revision tags: yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.126 24-May-2009 ad

More changes to improve kern_descrip.c.

- Avoid atomics in more places.
- Remove the per-descriptor mutex, and just use filedesc_t::fd_lock.
It was only being used to synchronize close, and in any case we needed
to take fd_lock to free the descriptor slot.
- Optimize certain paths for the <NDFDFILE case.
- Sprinkle more comments and assertions.
- Cache more stuff in filedesc_t.
- Fix numerous minor bugs spotted along the way.
- Restructure how the open files array is maintained, for clarity and so
that we can eliminate the membar_consumer() call in fd_getfile(). This is
mostly syntactic sugar; the main functional change is that fd_nfiles now
lives alongside the open file array.

Some measurements with libmicro:

- simple file syscalls are like close() are between 1 to 10% faster.
- some nice improvements, e.g. poll(1000) which is ~50% faster.


Revision tags: yamt-nfs-mp-base4 yamt-nfs-mp-base3 jym-xensuspend-base
# 1.125 04-May-2009 yamt

tweak some assertions on so_head to make them more meaningful.


Revision tags: nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.124 09-Apr-2009 yamt

0 -> NULL


# 1.123 09-Apr-2009 yamt

remove an unnecessary cast.


# 1.122 09-Apr-2009 yamt

0 -> NULL where appropriate


# 1.121 11-Mar-2009 mrg

completely rework the way that orphaned sockets that are being fdpassed
via SCM_RIGHTS messages are dealt with:

1. unp_gc: make this a kthread.

2. unp_detach: go not call unp_gc directly. instead, wake up unp_gc kthread.

3. unp_scan: do not close files here. instead, put them on a global list
for unp_gc to close, along with a per-file "deferred close count". if
file is already enqueued for close, just increment deferred close count.
this eliminates the recursive calls.

3. unp_gc: scan files on global deferred close list. close each file N
times, as specified by deferred close count in file. continue processing
list until it becomes empty (closing may cause additional files to be
queued for close).

4. unp_gc: add additional bit to mark files we are scanning. set during
initial scan of global file list that currently clears FMARK/FDEFER.
during later scans, never examine / garbage collect descriptors that
we have not marked during the earlier scan. do not proceed with this
initial scan until all deferred closes have been processed. be careful
with locking to ensure no races are introduced between deferred close
and file scan.

5. unp_gc: use dummy file_t to mark position in list when scanning. allow
us to drop filelist_lock. in turn allows us to eliminate kmem_alloc()
and safely close files, etc.

6. prohibit transfer of descriptors within SCM_RIGHTS messages if
(num_files_in_transit > maxfiles / unp_rights_ratio)

7. fd_allocfile: ensure recycled filse don't get scanned.


this is 97% work done by andrew doran, with a couple of minor bug fixes
and a lot of testing by yours truly.


Revision tags: nick-hppapmap-base2
# 1.120 08-Feb-2009 pooka

branches: 1.120.2;
Don't try to fd_putfile() descriptors we didn't manage to fd_getfile().

Fixes local DoS panic described in kern/40570.


Revision tags: netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.119 11-Oct-2008 pooka

branches: 1.119.2; 1.119.4;
Move uidinfo to its own module in kern_uidinfo.c and include in rump.
No functional change to uidinfo.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.118 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.117 20-Jun-2008 christos

branches: 1.117.2;
Also enforce that cm->cmsg_len >= CMSG_ALIGN(sizeof cmsghdr), from
Michael van Elst


# 1.116 20-Jun-2008 christos

Don't require cm->cmsg_len == control->m_len, just that the cm->cmsg_len
<= control->m_len, like FreeBSD does. Idea from Taylor R Campbell.


Revision tags: yamt-pf42-base4
# 1.115 10-Jun-2008 ad

There can be existing waiters on a socket's condition variables when we
change socket::so_lock, and they rely on the old lock to synchronize.
Wake them up whenever we change so_lock so they can restart their waits.


Revision tags: yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.114 28-Apr-2008 martin

branches: 1.114.2; 1.114.4;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.113 27-Apr-2008 ad

branches: 1.113.2;
Add a comment.


# 1.112 24-Apr-2008 ad

Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.111 20-Apr-2008 mlelstv

When unp_internalize fails (due to the sanity check or an out-of-memory
condition), it leaves the control message with file descriptors. Calling
unp_dispose() will interpret the message as containing file pointers
and crash the system.
This change removes unp_dispose() from this failure path and avoids
using goto to jump into switch statements...
The previous workaround to ignore such messages in unp_scan() is removed.


# 1.110 19-Apr-2008 mjf

If cm->cmsg_len is not valid for unp_internalize do not use it to work out
where the data is in unp_scan.

Fixes PR/38391


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.109 28-Mar-2008 ad

branches: 1.109.2;
Prevent overlapping calls to bind() and/or connect() on a Unix socket.


Revision tags: ad-socklock-base1
# 1.108 24-Mar-2008 yamt

merge yamt-lazymbuf branch.


Revision tags: yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.107 21-Mar-2008 rmind

unp_gc: unlock filelist_lock in a case of restart.


# 1.106 21-Mar-2008 ad

Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.105 25-Jan-2008 ad

branches: 1.105.6;
Remove VOP_LEASE. Discussed on tech-kern.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.104 05-Jan-2008 dsl

Use FILE_LOCK() and FILE_UNLOCK()


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base jmcneill-pm-base
# 1.103 08-Dec-2007 pooka

branches: 1.103.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.


Revision tags: vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase reinoud-bufcleanup-base
# 1.102 26-Nov-2007 pooka

branches: 1.102.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern


Revision tags: jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 vmlocking-base
# 1.101 08-Oct-2007 ad

branches: 1.101.4;
Merge file descriptor locking, cwdi locking and cross-call changes
from the vmlocking branch.


Revision tags: yamt-x86pmap-base2 yamt-x86pmap-base
# 1.100 19-Sep-2007 dyoung

branches: 1.100.2;
Make uipc_ctloutput() return ENOPROTOOPT instead of EINVAL when it
is passed a handle socket-option level that it does not care about.


Revision tags: nick-csl-alignment-base5
# 1.99 09-Aug-2007 he

branches: 1.99.2;
Add a new socket option for unix domain sockets: LOCAL_PEEREID, to make
it possible to get the pid, euid and egid of the process at the remote
end at the time it did bind() or connect().

Add a new libc function, getpeereid() to easily get at the euid and egid.
As a consequence, bump libc's minor number.

Document the LOCAL_PEEREID socket option in unix(4).

Based on contribution by Arne H. Juul, minor modifications by myself.


Revision tags: matt-mips64-base
# 1.98 03-Aug-2007 martin

branches: 1.98.2;
PR kern/32842:
do not leak file descriptors when sending a datagram with SCM_RIGHTS
fails. Patch from Gary Thorpe, based on changes in FreeBSD and work
from Christian Biere.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.97 22-Apr-2007 dsl

branches: 1.97.2; 1.97.6;
Change the way that emulations locate files within the emulation root to
avoid having to allocate space in the 'stackgap'
- which is very LWP unfriendly.
The additional code for non-emulation namei() is trivial, the reduction for
the emulations is massive.
The vnode for a processes emulation root is saved in the cwdi structure
during process exec.
If the emulation root the TRYEMULROOT flag are set, namei() will do an initial
search for absolute pathnames in the emulation root, if that fails it will
retry from the normal root.
".." at the emulation root will always go to the real root, even in the middle
of paths and when expanding symlinks.
Absolute symlinks found using absolute paths in the emulation root will be
relative to the emulation root (so /usr/lib/xxx.so -> /lib/xxx.so links
inside the emulation root don't need changing).
If the root of the emulation would be returned (for an emulation lookup), then
the real root is returned instead (matching the behaviour of emul_lookup,
but being a cheap comparison here) so that programs that scan "../.."
looking for the root dircetory don't loop forever.
The target for symbolic links is no longer mangled (it used to get the
CHECK_ALT_xxx() treatment, so could get /emul/xxx prepended).
CHECK_ALT_xxx() are no more. Most of the change is deleting them, and adding
TRYEMULROOT to the flags to NDINIT().
A lot of the emulation system call stubs could now be deleted.


Revision tags: thorpej-atomic-base
# 1.96 03-Apr-2007 hannken

Remove calls to now obsolete vn_start_write() and vn_finished_write().


# 1.95 04-Mar-2007 christos

branches: 1.95.2; 1.95.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.94 01-Nov-2006 cbiere

branches: 1.94.2; 1.94.4; 1.94.8;
Pointing one element past an array is fine, pointing before it isn't.


Revision tags: yamt-splraiseipl-base2 yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.93 03-Sep-2006 christos

branches: 1.93.2; 1.93.4;
use c99 initializers


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base8 yamt-pdpolicy-base7
# 1.92 23-Jul-2006 ad

Use the LWP cached credentials where sane.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.91 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.90 14-Apr-2006 christos

Coverity CID 1089: Add more KASSERTs to prevent NULL deref.


# 1.89 14-Apr-2006 christos

Coverity CID 1088: Add KASSERT to prevent NULL pointer deref.


# 1.88 13-Apr-2006 matt

Add a KASSERT to document a condition for the PRU_ABORT case.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.87 01-Mar-2006 christos

branches: 1.87.2; 1.87.4; 1.87.6;
PR/32856: Christian Biere: Don't panic if you send a control message with
SCM_RIGHTS on an unconnected stream socket.


# 1.86 11-Dec-2005 christos

branches: 1.86.2; 1.86.4; 1.86.6;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base ktrace-lwp-base
# 1.85 11-Nov-2005 simonb

Call nanotime() directly, instead of doing the
microtime()/TIMEVAL_TO_TIMESPEC() dance.


Revision tags: yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.84 30-Aug-2005 jmmv

Honor the user's umask while creating local sockets. Several other systems
do already this (such as FreeBSD, OpenBSD and Linux), so it will improve
portability of some third-party programs. No objections in tech-kern@.


# 1.83 16-Jun-2005 yamt

branches: 1.83.2;
uipc_usrreq: plug mbuf leak.


# 1.82 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


# 1.81 07-May-2005 christos

PR/30154: YAMAMOTO Takashi: tcp_close locking botch
chgsbsize() as mentioned in the PR can be called from an interrupt context
via tcp_close(). Avoid calling uid_find() in chgsbsize().
- Instead of storing so_uid in struct socketvar, store *so_uidinfo
- Add a simple lock to struct uidinfo.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.80 26-Feb-2005 perry

branches: 1.80.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.79 03-Sep-2004 darrenr

branches: 1.79.4; 1.79.6;
add a per-socket counter for dropped UDP packets when the internal buffers
are full.


# 1.78 22-May-2004 jonathan

Eliminate several uses of `curproc' from the socket-layer code and from NFS.

Add a new explicit `struct proc *p' argument to socreate(), sosend().
Use that argument instead of curproc. Follow-on changes to pass that
argument to socreate(), sosend(), and (*so->so_send)() calls.
These changes reviewed and independently recoded by Matt Thomas.

Changes to soreceive() and (*dom->dom_exernalize() from Matt Thomas:
pass soreceive()'s struct uio* uio->uio_procp to unp_externalize().
Eliminate curproc from unp_externalize. Also, now soreceive() uses
its uio->uio_procp value, pass that same value downward to
((pr->pru_usrreq)() calls for consistency, instead of (struct proc * )0.

Similar changes in sys/nfs to eliminate (most) uses of curproc,
either via the req-> r_procp field of a struct nfsreq *req argument,
or by passing down new explicit struct proc * arguments.

Reviewed by: Matt Thomas, posted to tech-kern.
NB: The (*pr->pru_usrreq)() change should be tested on more (all!) protocols.


# 1.77 18-Apr-2004 matt

Constify sun_noname.


# 1.76 18-Apr-2004 matt

ANSI'fy.


# 1.75 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.74 23-Mar-2004 junyoung

branches: 1.74.2; 1.74.4;
Nuke __P().


# 1.73 29-Dec-2003 martin

Avoid using m_clget() on a mbuf already in use, especially when we
need the data in the mbuf later and m_clget() changes some fields
overlaid to regular mbuf data. Instead, rearange code a bit, create
data into a new allocated buffer and and use MEXTADD to attach it to
the mbuf, if the mbuf internal space is not sufficient.

This fixes a crash on sparc64 (and probably all other archs where
sizeof(int) != sizeof(struct file *)) when running
regress/sys/kern/unfdpass.

Idea for solution from Matt Thomas, with additional input from YAMAMOTO
Takashi.


# 1.72 29-Nov-2003 matt

Restore a change that made AF_LOCAL sockets block on connect(2) until
accepted. However, this time this behavor is not the default. Instead
it must enabled by using the LOCAL_CONNWAIT socket option on either the
connecting or accepting socket.


# 1.71 29-Nov-2003 perry

Revert a change that altered the semantics of AF_LOCAL sockets. Sadly
this made us API incompatible with other Unixes.


# 1.70 15-Oct-2003 hannken

Add the gating of system calls that cause modifications to the underlying
file system.
The function vfs_write_suspend stops all new write operations to a file
system, allows any file system modifying system calls already in progress
to complete, then sync's the file system to disk and returns. The
function vfs_write_resume allows the suspended write operations to
complete.

From FreeBSD with slight modifications.

Approved by: Frank van der Linden <fvdl@netbsd.org>


# 1.69 03-Sep-2003 matt

Fix typo.


# 1.68 03-Sep-2003 matt

Change the behavor of AF_LOCAL connect() to sleep until the server has
accepted the connection. This can prevent a client from overwhelming a
server.


# 1.67 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.66 24-Jul-2003 jdolecek

back rev 1.63 (the linux hack) off - no compat specific code
in generic code, please

we need to massage the passed linux cmsg anyway, linux uses different
alignment for CMSG_DATA on at least some architectures


# 1.65 23-Jul-2003 itojun

backout previous, there was a comment on LINUX_SOL_SOCKET=1


# 1.64 23-Jul-2003 itojun

#define LINUX_SOL_SOCKET 1, so that we can answer "what the hell is this 1?"
at ease.


# 1.63 23-Jul-2003 christos

From Todd Vierling: Accept level == 1 for linux compat.


# 1.62 29-Jun-2003 fvdl

branches: 1.62.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.61 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.60 10-Apr-2003 christos

RP/21088: Jesse Off: Return ENOBUFS instead of EINVAL when sbappend fails.


# 1.59 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


# 1.58 25-Feb-2003 pk

Fix a simple_lock() mismatch in unp_internalize().
We may need to merge the passes over the files contained in the message
as noted by enami tsugutomo on tech-smp.


# 1.57 23-Feb-2003 pk

Make updating a file's reference and use count MP-safe.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.56 25-Nov-2002 itojun

branches: 1.56.2;
no need for error check after MEXTMALLOC - jdolecek


# 1.55 25-Nov-2002 itojun

MEXTMALLOC() can fail even if M_WAITOK, if arg is too big for malloc().


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.54 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base gehenna-devsw-base eeh-devprop-base newlock-base ifpoll-base thorpej-mips-cache-base
# 1.53 12-Nov-2001 lukem

add RCSIDs


# 1.52 18-Oct-2001 thorpej

branches: 1.52.2;
Deprecate the "m_act" alias of "m_nextpkt" (m_act is a historical
name), and just use m_nextpkt everywhere.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.51 14-Jun-2001 thorpej

branches: 1.51.2;
Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.


# 1.50 07-Jun-2001 thorpej

Rework fdalloc() even further: split fdalloc() into fdalloc() and
fdexpand(). The former will return ENOSPC if there is not space
in the current filedesc table. The latter performs the expansion
of the filedesc table. This means that fdalloc() won't ever block,
and it gives callers an opportunity to clean up before the
potentially-blocking fdexpand() call.

Update all fdalloc() callers to deal with the need-to-fdexpand() case.

Rewrite unp_externalize() to use fdalloc() and fdexpand() in a
safe way, using an algorithm suggested by Bill Sommerfeld:
- Use a temporary array of integers to hold the new filedesc table
indexes. This allows us to repeat the loop if necessary.
- Loop through the array of file *'s, assigning them to filedesc table
slots. If fdalloc() indicates expansion is necessary, undo the
assignments we've done so far, expand, and retry the whole process.
- Once all file *'s have been assigned to slots, update the f_msgcount
and unp_rights counters.
- Right before we return, copy the temporary integer array to the message
buffer, and trim the length as before.
Note that once locking is added to the filedesc array, this entire
operation will be `atomic', in that the lock will be held while
file *'s are assigned to embryonic table slots, thus preventing anything
else from using them.


# 1.49 06-Jun-2001 thorpej

Change fdalloc() to return ERESTART if we had to reallocate the
descriptor array, which may have blocked. Change callers of
fdalloc() to restart whatever they\'re doing if this condition
happens. (XXX unp_externalize() needs some work, but that will
be tackled later.)

Change finishdup() to close the descriptor in the `new\' slot if
one exists, and change sys_dup2() accordingly.

Closes a race condition when using kernel-assisted user threads.

While here, garbage-collect UF_MAPPED -- it is not used anywhere.


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base thorpej_scsipi_base
# 1.48 05-Jun-2000 thorpej

branches: 1.48.2; 1.48.4;
Oops, missed a couple of places where CMSG_*() should be used. No
functional change in this case, but the code is now correct.


# 1.47 05-Jun-2000 thorpej

- Fix file descriptor passing AGAIN. This has apparently been broken
on LP64 systems (and probably the SPARC) since the __cmsg_alignbytes()
changes went in.
- Change file descriptor passing to use CMSG_DATA(), not (cm + 1). This
pretty much has to be done in order to make it work properly on LP64,
and considering that it's been broken this long...
- Use CMSG_SPACE() to determine the mbuf length needed for a given
control message, and CMSG_LEN() to stash in the cmsg_len member.


Revision tags: minoura-xpg4dl-base
# 1.46 30-Mar-2000 augustss

branches: 1.46.2;
Get rid of register declarations.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base chs-ubc2-base
# 1.45 17-Jun-1999 thorpej

branches: 1.45.2;
Um, hi, let's initialize pointers before we use them.


# 1.44 05-May-1999 thorpej

Add "use counting" to file entries. When closing a file, and it's reference
count is 0, wait for use count to drain before finishing the close.

This is necessary in order for multiple processes to safely share file
descriptor tables.


# 1.43 05-May-1999 thorpej

Fix alignment problem in the garbage-collection code path.


# 1.42 30-Apr-1999 thorpej

Break cdir/rdir/cmask info out of struct filedesc, and put it in a new
substructure, `cwdinfo'. Implement optional sharing of this substructure.

This is required for clone(2).


# 1.41 21-Apr-1999 mrg

revert previous. oops.


# 1.40 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: netbsd-1-4-base
# 1.39 22-Mar-1999 sommerfe

branches: 1.39.2;
Disallow descriptor-passing of descriptors which are open on
directories which aren't under the recipient's root.

Clean up of many error conditions involving descriptor passing, to
eliminate infinite loops, panics, premature garbage collection of
sockets, and descriptor leaks:
- Avoid letting unp_gc() see descriptors with a refcount of zero by
removing them from the socket's queue before releasing them.
- Avoid socket leak in PRU_ABORT (this will also gc descriptors queued
on a not-yet accepted socket when the accepting socket goes away).
- Put in block comment explaining how unp_gc() should work.
- Correctly manage unp_defer count so we don't get stuck in an infinite
loop with nothing to do.
- Don't tie MARK and DEFER bits so closely together.
- Mark descriptors queued on not-yet-accepted sockets as well.
- Don't call sorflush on non-socket, it doesn't work very well.
- Deal with discard of NULL file pointer.
- Hopefully cause GC to converge faster by only deferring sockets in
unp_mark().


# 1.38 21-Dec-1998 thorpej

In unp_internalize(), add a comment explaining why we must ALIGN() the
data after the cmsghdr when accessing internalized SCM_RIGHTS messages
(i.e. array of struct file *s). The historic interface does not align
the externalized SCM_RIGHTS messages (i.e. array of ints).


# 1.37 21-Dec-1998 thorpej

Fix a fencepost error in unp_scan() which caused a bad pointer deref on
the SPARC platform only (ILP32 but ALIGNBYTES of 7), due to a missing
ALIGN().


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.36 04-Aug-1998 perry

Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)


# 1.35 31-Jul-1998 perry

fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.


Revision tags: eeh-paddr_t-base
# 1.34 18-Jul-1998 lukem

branches: 1.34.2;
use AF_LOCAL instead of AF_UNIX


# 1.33 16-Jul-1998 thorpej

Back out previous, I botched something.


# 1.32 10-Jul-1998 thorpej

For SOCK_STREAM, provide the socket credentials to the accepter as soon as
the client connects.


# 1.31 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.30 07-Jan-1998 thorpej

Implement passing credentials as ancillary data on Unix domain sockets,
enabled with the LOCAL_CREDS socket option on the listener. Semantics are
similar to BSD/OS's:
- Creds are available with first data on SOCK_STREAM, and with every datagram
on SOCK_DGRAM.
- It is not possible to forge credentials.

Different in that:
- Different credential data structure (ours does not rely on the format
of internal kernel data structures, and does not pass the login name).
- We can pass creds and file descriptors at the same time (this does not
work in BSD/OS).

Luke Mewburn <lukem@netbsd.org> gets credit for inspiring me to implement
this. :-)


# 1.29 07-Jan-1998 thorpej

Fix passing of multiple file descriptors (was broken when code was made
64-bit safe).


Revision tags: netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.28 17-Oct-1997 christos

branches: 1.28.2;
PR/4280: Chris Jones: Sending more than one fd over AF_UNIX sockets causes
panic. Bug in the fd -> struct file * conversion...


Revision tags: thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.27 26-Jun-1997 thorpej

Several small changes to eliminate kludginess in dealing with unix domain
socket names:
- In unp_setsockaddr() and unp_setpeeraddr(), if the socket name can't
fit into a single mbuf, allocate enough external storage space to
hold it.
- In unp_bind() and unp_connect(), perform a similar operation, but allocate
one extra byte, and ensure that the pathname is nul-terminated.
Many thanks to enami tsugutomo <enami@cv.sony.co.jp> for the sanity
checking.


# 1.26 24-Jun-1997 thorpej

Eliminate use of dtom() in the handing of UNIX domain sockets. Add an
"unp_addrlen" member to the unpcb, and use it when copying the socket
name. This eliminates that last uses of dtom() in the system.


# 1.25 15-May-1997 kleink

When fstat(2)ing a file descriptor of a local communications domain socket,
fill the socket's creation time into the stat structure's st_[acm]time fields:
POSIX requires this behavior for pipe(2). N.B.: updating the st_[am]time fields
when reading/writing the pipe is neither required nor implemented, though.


# 1.24 10-Apr-1997 cgd

Internalize and externalize file descriptors being passed via local domain
socket control messages correctly, without assuming that sizeof(int) ==
sizeof(pointer). Fixes PR#3183.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.23 23-May-1996 mycroft

Oops. Add missing label.


# 1.22 23-May-1996 mycroft

We can only get a control mbuf for PRU_SEND or PRU_SENDOOB. Add diagnostic
code to panic in this case.


# 1.21 23-May-1996 mycroft

Make sure the control and data mbufs are freed in all cases.


# 1.20 23-May-1996 mycroft

Separate some code into separate functions.
Make unp_addr be a pointer to the sockaddr, not to the mbuf, as with raw
sockets.
Other minor cleanup.


# 1.19 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.18 09-Feb-1996 christos

branches: 1.18.4;
More proto fixes


# 1.17 04-Feb-1996 pk

unp_detach() return type botch.


# 1.16 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.15 17-Aug-1995 mycroft

so_pcb should be a void *.


# 1.14 16-Aug-1995 mycroft

Allocate PCBs with malloc(), more mgetclr(). Be more careful to free the
PCB after it's done with.


# 1.13 05-Apr-1995 mycroft

Add missing argument to closef().


# 1.12 13-Dec-1994 mycroft

LEASE_CHECK -> VOP_LEASE


# 1.11 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.10 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.9 08-Jun-1994 mycroft

Update to 4.4-Lite fs code.


# 1.8 04-May-1994 mycroft

Fix panic when closing a file descriptor on which access rights have been sent
but not received.


# 1.7 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.6 14-Sep-1993 mycroft

Fix from Noriyuki Soda <soda@sra.co.jp>:
recvmsg(2) always returns -1 with errno==EMSGSIZE, when trying
to pass file descriptors through UNIX domain socket.


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.5 27-Jun-1993 andrew

branches: 1.5.4;
ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.4 12-Jun-1993 andrew

Yuval Yarom's 'panic:closef: count < 0' fix to unp_discard().


# 1.3 22-May-1993 cgd

add include of select.h if necessary for protos, or delete if extraneous


# 1.2 18-May-1993 cgd

make kernel select interface be one-stop shopping & clean it all up.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


Revision tags: isaki-audio2-base
# 1.192 01-Mar-2019 pgoyette

Rename the MODULE_*_HOOK() macros to MODULE_HOOK_*() as briefly
discussed on irc.

NFCI intended.

Ride the earlier kernel bump - it;s getting crowded.


# 1.191 20-Feb-2019 pgoyette

compat70_ocreds_valid is not a pointer to a boolean, it is the boolean
itself which controls whether or not we recognize the OCRED options.

Should fix the panic identified in PR kern/53991 (awaiting confirmation
from submitter).


# 1.190 04-Feb-2019 mrg

add or adjust fallthru comments.


# 1.189 29-Jan-2019 pgoyette

Normalize all the compat hooks' names to the form

<subsystem>_<function>_<version>_hook

NFCI

XXX Note that although this introduces a change in the kernel-to-
XXX module interface, we are NOT bumping the kernel version number.
XXX We will bump the version number once the interface stabilizes.


# 1.188 27-Jan-2019 pgoyette

Merge the [pgoyette-compat] branch


Revision tags: pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.187 08-Nov-2018 roy

Don't call soroverflow when we return the error to the sender.

Thanks to thorpej@ for a sanity check.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.186 11-May-2018 roy

Increase the default size of some receive buffers from 8k to 16k.
This mitigates recent reports of socket overflow errors
and fixes PR bin/53247.


# 1.185 05-May-2018 christos

bump PIPSIZ from 4 to 8K like FreeBSD and provide the same sysctls


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.184 19-Mar-2018 roy

socket: report receive buffer overflows

Add soroverflow() which increments the overflow counter, sets so_error
to ENOBUFS and wakes the receive socket up.
Replace all code that manually increments this counter with soroverflow().
Add soroverflow() to raw_input().

This allows userland to detect route(4) overflows so it can re-sync
with the current state.


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.183 17-Feb-2018 christos

branches: 1.183.2;
fix LOCAL_PEEREID to not return the same info for both sides...
XXX: pullup-{7,8}


Revision tags: tls-maxphys-base-20171202
# 1.182 02-Dec-2017 mrg

include opt_compat_netbsd.h, so that eg COMPAT_70 will be set.


Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.181 31-Oct-2016 maxv

branches: 1.181.8;
Memory leak, found by Mootja. It is easily triggerable from userland.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.180 06-Apr-2016 roy

branches: 1.180.2;
Add sc_pid to sockcred so that SOCK_DGRAM and LOCAL_CREDS socket option
can learn the process id of the AF_LOCAL sender.
Add compat glue for old structure.


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606
# 1.179 02-May-2015 rtr

make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}


# 1.178 26-Apr-2015 rtr

remove pr_generic from struct pr_usrreqs and all implementations of
pr_generic in protocols.

bump to 7.99.13

approved by rmind@


# 1.177 24-Apr-2015 rtr

make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19


Revision tags: nick-nhusb-base-20150406
# 1.176 03-Apr-2015 rtr

* change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@


# 1.175 01-Mar-2015 christos

PR/39918: ITOH Yasufumi: Replace KASSERT with continue, since the file
descriptor can be closed since closef() does not pay attention to FDEFER.
XXX: Pullup-7


# 1.174 28-Feb-2015 rtr

Place opening brace to function at column 0, like in the rest of the file.


# 1.173 02-Feb-2015 christos

Handle LOCAL_PEERID for socketpair() connected sockets which connect through
connect2().
1. move the code that sets the peerid structure into connect1(). This
handles so2. The datagram code calls connect2 twice with flipped
so arguments so both sockets get set.
2. in connect2 copy the peerid structure from so2 to so, so that that
both stream sockets get set.


Revision tags: nick-nhusb-base
# 1.172 08-Oct-2014 taca

branches: 1.172.2;
Make behavior of getsockname(2) (and maybe getpeername(2)) as the same as
NetBSD 6.1_STABLE and other operating system (OS X 10.9.5).

* sa_len of sockaddr_un strucrure is always set to sizeof(sun_path).
* pathname stored in sun_path is alwasys '\0' terminated (except length
of sun_path is sizeof(sun_path)?).

Should be fix PR kern/49247, runtime problem of lmtp service of dovecot2 on
NetBSD current and NetBSD 7.0_BETA.


# 1.171 05-Sep-2014 matt

Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.


# 1.170 05-Sep-2014 matt

Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 09-Aug-2014 rtr

branches: 1.169.2;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@


# 1.168 08-Aug-2014 rtr

split PRU_RCVD function out of pr_generic() usrreq switches and put into
separate functions

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_RCVD with calls to
pr_rcvd()


# 1.167 05-Aug-2014 rtr

actually use the passed in struct lwp *l instead of curlwp in unp_connect()


# 1.166 05-Aug-2014 rtr

split PRU_SEND function out of pr_generic() usrreq switches and put into
separate functions

xxx_send(struct socket *, struct mbuf *, struct mbuf *,
struct mbuf *, struct lwp *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_SEND with calls to
pr_send()

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_send() PRUs

- l2cap_send() -> l2cap_send_pcb()
- sco_send() -> sco_send_pcb()
- rfcomm_send() -> rfcomm_send_pcb()

patch reviewed by rmind


# 1.165 05-Aug-2014 rtr

revert the removal of struct lwp * parameter from bind, listen and connect
user requests.

this should resolve the issue relating to nfs client hangs presented
recently by wiz on current-users@


# 1.164 31-Jul-2014 rtr

* remove declarations of unp_bind, unp_discard, unp_disconnect1, unp_drop,
unp_shutdown1, unp_internalize and unp_output functions from sys/un.h
and instead declare them as static in uipc_usrreq.c with prototype
declarations as necessary.

* remove struct lwp * parameter from unp_output() while here and just
use curlwp instead.

as discussed with rmind


# 1.163 31-Jul-2014 rtr

split PRU_DISCONNECT, PRU_SHUTDOWN and PRU_ABORT function out of
pr_generic() usrreq switches and put into separate functions

xxx_disconnect(struct socket *)
xxx_shutdown(struct socket *)
xxx_abort(struct socket *)

- always KASSERT(solocked(so)) even if not implemented
- replace calls to pr_generic() with req =
PRU_{DISCONNECT,SHUTDOWN,ABORT}
with calls to pr_{disconnect,shutdown,abort}() respectively

rename existing internal functions used to implement above functionality
to permit use of the names for xxx_{disconnect,shutdown,abort}().

- {l2cap,sco,rfcomm}_disconnect() ->
{l2cap,sco,rfcomm}_disconnect_pcb()
- {unp,rip,tcp}_disconnect() -> {unp,rip,tcp}_disconnect1()
- unp_shutdown() -> unp_shutdown1()

patch reviewed by rmind


# 1.162 30-Jul-2014 rtr

split PRU_CONNECT function out of pr_generic() usrreq switches and put
into seaparate functions

xxx_listen(struct socket *, struct mbuf *)

- always KASSERT(solocked(so)) and KASSERT(nam != NULL)
- replace calls to pr_generic() with req = PRU_CONNECT with
pr_connect()
- rename existin {l2cap,sco,rfcomm}_connect() to
{l2cap,sco,rfcomm}_connect_pcb() respectively to permit
naming consistency with other protocols functions.
- drop struct lwp * parameter from unp_connect() and at_pcbconnect()
and use curlwp instead where appropriate.

patch reviewed by rmind


# 1.161 24-Jul-2014 rtr

split PRU_BIND and PRU_LISTEN function out of pr_generic() usrreq
switches and put into separate functions
xxx_bind(struct socket *, struct mbuf *)
xxx_listen(struct socket *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_BIND with call to
pr_bind()

- replace calls to pr_generic() with req = PRU_LISTEN with call to
pr_listen()

- drop struct lwp * parameter from at_pcbsetaddr(), in_pcbbind() and
unp_bind() and always use curlwp.

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_{bind,listen}() PRUs

- l2cap_{bind,listen}() -> l2cap_{bind,listen}_pcb()
- sco_{bind,listen}() -> sco_{bind,listen}_pcb()
- rfcomm_{bind,listen}() -> rfcomm_{bind,listen}_pcb()

patch reviewed by rmind

welcome to netbsd 6.99.48


# 1.160 23-Jul-2014 rtr

split PRU_SENDOOB and PRU_RCVOOB function out of pr_generic() usrreq
switches and put into separate functions
xxx_sendoob(struct socket *, struct mbuf *, struct mbuf *)
xxx_recvoob(struct socket *, struct mbuf *, int)

- always KASSERT(solocked(so)) even if request is not implemented

- replace calls to pr_generic() with req = PRU_{SEND,RCV}OOB with
calls to pr_{send,recv}oob() respectively.

there is still some tweaking of m_freem(m) and m_freem(control) to come
for consistency. not performed with this commit for clarity.

reviewed by rmind


# 1.159 09-Jul-2014 rtr

* split PRU_ACCEPT function out of pr_generic() usrreq switches and put
into a separate function xxx_accept(struct socket *, struct mbuf *)

note: future cleanup will take place to remove struct mbuf parameter
type and replace it with a more appropriate type.

patch reviewed by rmind


# 1.158 09-Jul-2014 rtr

* split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47


# 1.157 07-Jul-2014 rtr

* sprinkle KASSERT(solocked(so)); in all pr_stat() functions.
* fix remaining inconsistent struct socket parameter names.


# 1.156 06-Jul-2014 rtr

* split PRU_SENSE functionality out of xxx_usrreq() switches and place into
separate xxx_stat(struct socket *, struct stat *) functions.
* replace calls using pr_generic with req == PRU_SENSE with pr_stat().

further change will follow that cleans up the pattern used to extract the
pcb and test for its presence.

reviewed by rmind


# 1.155 01-Jul-2014 rtr

fix parameter types in pr_ioctl, called xx_control() functions and remove
abuse of pointer to struct mbuf type.

param2 changed to u_long type and uses parameter name 'cmd' (ioctl command)
param3 changed to void * type and uses parameter name 'data'
param4 changed to struct ifnet * and uses parameter name 'ifp'
param5 has been removed (formerly struct lwp *) and uses of 'l' have been
replaced with curlwp from curproc(9).

callers have had (now unnecessary) casts to struct mbuf * removed, called
code has had (now unnecessary) casts to u_long, void * and struct ifnet *
respectively removed.

reviewed by rmind@


# 1.154 22-Jun-2014 rtr

* split PRU_CONTROL functionality out of xxx_userreq() switches and place
into separate xxx_ioctl() functions.
* place KASSERT(req != PRU_CONTROL) inside xxx_userreq() as it is now
inappropriate for req = PRU_CONTROL in xxx_userreq().
* replace calls to pr_generic() with req = PRU_CONTROL with pr_ioctl().
* remove & fixup references to PRU_CONTROL xxx_userreq() function comments.
* fix various comments references for xxx_userreq() that mentioned
PRU_CONTROL as xxx_userreq() no longer handles the request.

a further change will follow to fix parameter and naming inconsistencies
retained from original code.

Reviewed by rmind@


# 1.153 08-Jun-2014 christos

Handle race where the server closed the socket between us 'connecting' and
sending data.


# 1.152 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.151 18-May-2014 rmind

Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.150 23-Jan-2014 hannken

branches: 1.150.2;
Change vnode operations create, mknod, mkdir and symlink to return
the resulting vnode *vpp unlocked.

Discussed on tech-kern@

Welcome to 6.99.30


# 1.149 17-Jan-2014 hannken

Change vnode operations create, mknod, mkdir and symlink to keep the
directory node dvp locked on return.

Discussed on tech-kern@

Welcome to 6.99.29


# 1.148 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.147 25-Oct-2013 martin

Eliminat an unused variable and simplify the KASSERT which used to use it


# 1.146 08-Oct-2013 christos

Centralize the sockaddr_un allocation code. Set sun_len appropriately so
that the address length returned is correct, not always 106. Note that
we do things slightly differently than linux and explain why. Unit-tests
to come.


# 1.145 08-Oct-2013 christos

- Instead of having accept(2) return a zero-filled sockaddr for the case
where accept(2) was called on a unix socket that called connect(2) and
then close(2), before the connection was accepted, return the empty
sockaddr_un.
- Fix the length of the empty sockaddr_un socket so that it reflects reality.


# 1.144 29-Aug-2013 rmind

Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.


# 1.143 01-Aug-2013 drochner

In unp_externalize, don't do anything if an SCM_RIGHTS control message
was sent with zero file descriptors in it. Otherwise, a zero-length
temporary storage would be allocated which triggers panic on DIAGNOSTIC
kernels (but is harmless for release kernels).
reviewed by Taylor R Campbell


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.142 27-Jun-2013 christos

branches: 1.142.2;
use sbcreatecontrol1() and m_add() instead of open-coding everything, and
getting it slightly wrong.


Revision tags: agc-symver-base
# 1.141 14-Feb-2013 riastradh

Fix some screw cases in cmsg file descriptor passing.

- Don't leave garbage in the control buffer if allocating file
descriptors fails in unp_externalize.

- Scrub the space between CMSG_LEN and CMSG_SPACE to avoid kernel
memory disclosure in unp_externalize.

- Don't read past cmsg_len when closing file descriptors that
couldn't get delivered, in free_rights.

ok christos


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.140 06-Oct-2012 christos

Avoid crash dereferencing a NULL fp in fd_affix() in unp_externalize
caused by the sequence of passing two fd's with two sendmsg()'s,
then doing a read() and a recvmsg(). The read() calls dom_dispose()
which discards both messages in the mbuf, and sets the fp's in the
array to NULL. Linux dequeues only one message per read() so the
second recvmsg() gets the fd from the second message. This fix
just avoids the NULL pointer de-reference, making the second
recvmsg() to fail. It is dubious to pass fd's with stream sockets
and expect mixing read() and recvmsg() to work. Plus processing
one control message per read() changes the current semantics and
should be examined before applied. In addition there is a race between
dom_externalize() and dom_dispose(): what happens in a multi-threaded
network stack when one thread disposes where the other externalizes
the same array?

NB: Pullup to 6.


# 1.139 30-Jul-2012 christos

branches: 1.139.2;
remove infinite loop on error, extra parens on return.


# 1.138 30-Jul-2012 christos

simplify unp_externalize(), some from gimpy, some from me.


# 1.137 02-Jun-2012 martin

Stopgap fix for PR kern/46463: disallow passing of kqueue descriptors
via SCM_RIGHT anxiliary socket messages.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.136 26-Jun-2011 christos

branches: 1.136.2; 1.136.8;
* Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).

- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.

* Fix the system calls that take socklen_t arguments to actually do so.

* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).

* Bump libc version for the new syscalls.


# 1.135 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.134 29-May-2011 manu

branches: 1.134.2;
Add SOCK_SEQPACKET to PL_LOCAL sockets. Based on patch from Jesse Off,
submitted 8 years ago:
http://mail-index.netbsd.org/tech-kern/2003/04/14/0006.html


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.133 19-Nov-2010 dholland

branches: 1.133.2;
Introduce struct pathbuf. This is an abstraction to hold a pathname
and the metadata required to interpret it. Callers of namei must now
create a pathbuf and pass it to NDINIT (instead of a string and a
uio_seg), then destroy the pathbuf after the namei session is
complete.

Update all namei call sites accordingly. Add a pathbuf(9) man page and
update namei(9).

The pathbuf interface also now appears in a couple of related
additional places that were passing string/uio_seg pairs that were
later fed into NDINIT. Update other call sites accordingly.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3
# 1.132 21-Oct-2010 yamt

unp_connect: fix an assertion


# 1.131 21-Oct-2010 yamt

unp_connect2: fix a comment.


Revision tags: yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.130 24-Jun-2010 hannken

Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.129 09-Feb-2010 wiz

branches: 1.129.2;
Fix typo in comment.


Revision tags: uebayasi-xip-base
# 1.128 08-Jan-2010 pooka

branches: 1.128.2;
The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change


Revision tags: matt-premerge-20091211 yamt-nfs-mp-base8 jym-xensuspend-nbase
# 1.127 26-Aug-2009 bouyer

In uipc_usrreq(PRU_ACCEPT), grab the unp_streamlock before unp_setpeerlocks().
This fixes a race where, for a short period of time, so->so_lock and
so2->so_lock are not sync. This makes solocked2() and solocked()
unreliable and cause DIAGNOSTIC kernel panics. This also fixes a possible
panic in unp_setaddr() which expects the socket locked.
Should fix kern/38968, fix proposed in
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005863.html


Revision tags: yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.126 24-May-2009 ad

More changes to improve kern_descrip.c.

- Avoid atomics in more places.
- Remove the per-descriptor mutex, and just use filedesc_t::fd_lock.
It was only being used to synchronize close, and in any case we needed
to take fd_lock to free the descriptor slot.
- Optimize certain paths for the <NDFDFILE case.
- Sprinkle more comments and assertions.
- Cache more stuff in filedesc_t.
- Fix numerous minor bugs spotted along the way.
- Restructure how the open files array is maintained, for clarity and so
that we can eliminate the membar_consumer() call in fd_getfile(). This is
mostly syntactic sugar; the main functional change is that fd_nfiles now
lives alongside the open file array.

Some measurements with libmicro:

- simple file syscalls are like close() are between 1 to 10% faster.
- some nice improvements, e.g. poll(1000) which is ~50% faster.


Revision tags: yamt-nfs-mp-base4 yamt-nfs-mp-base3 jym-xensuspend-base
# 1.125 04-May-2009 yamt

tweak some assertions on so_head to make them more meaningful.


Revision tags: nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.124 09-Apr-2009 yamt

0 -> NULL


# 1.123 09-Apr-2009 yamt

remove an unnecessary cast.


# 1.122 09-Apr-2009 yamt

0 -> NULL where appropriate


# 1.121 11-Mar-2009 mrg

completely rework the way that orphaned sockets that are being fdpassed
via SCM_RIGHTS messages are dealt with:

1. unp_gc: make this a kthread.

2. unp_detach: go not call unp_gc directly. instead, wake up unp_gc kthread.

3. unp_scan: do not close files here. instead, put them on a global list
for unp_gc to close, along with a per-file "deferred close count". if
file is already enqueued for close, just increment deferred close count.
this eliminates the recursive calls.

3. unp_gc: scan files on global deferred close list. close each file N
times, as specified by deferred close count in file. continue processing
list until it becomes empty (closing may cause additional files to be
queued for close).

4. unp_gc: add additional bit to mark files we are scanning. set during
initial scan of global file list that currently clears FMARK/FDEFER.
during later scans, never examine / garbage collect descriptors that
we have not marked during the earlier scan. do not proceed with this
initial scan until all deferred closes have been processed. be careful
with locking to ensure no races are introduced between deferred close
and file scan.

5. unp_gc: use dummy file_t to mark position in list when scanning. allow
us to drop filelist_lock. in turn allows us to eliminate kmem_alloc()
and safely close files, etc.

6. prohibit transfer of descriptors within SCM_RIGHTS messages if
(num_files_in_transit > maxfiles / unp_rights_ratio)

7. fd_allocfile: ensure recycled filse don't get scanned.


this is 97% work done by andrew doran, with a couple of minor bug fixes
and a lot of testing by yours truly.


Revision tags: nick-hppapmap-base2
# 1.120 08-Feb-2009 pooka

branches: 1.120.2;
Don't try to fd_putfile() descriptors we didn't manage to fd_getfile().

Fixes local DoS panic described in kern/40570.


Revision tags: netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.119 11-Oct-2008 pooka

branches: 1.119.2; 1.119.4;
Move uidinfo to its own module in kern_uidinfo.c and include in rump.
No functional change to uidinfo.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.118 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.117 20-Jun-2008 christos

branches: 1.117.2;
Also enforce that cm->cmsg_len >= CMSG_ALIGN(sizeof cmsghdr), from
Michael van Elst


# 1.116 20-Jun-2008 christos

Don't require cm->cmsg_len == control->m_len, just that the cm->cmsg_len
<= control->m_len, like FreeBSD does. Idea from Taylor R Campbell.


Revision tags: yamt-pf42-base4
# 1.115 10-Jun-2008 ad

There can be existing waiters on a socket's condition variables when we
change socket::so_lock, and they rely on the old lock to synchronize.
Wake them up whenever we change so_lock so they can restart their waits.


Revision tags: yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.114 28-Apr-2008 martin

branches: 1.114.2; 1.114.4;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.113 27-Apr-2008 ad

branches: 1.113.2;
Add a comment.


# 1.112 24-Apr-2008 ad

Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.111 20-Apr-2008 mlelstv

When unp_internalize fails (due to the sanity check or an out-of-memory
condition), it leaves the control message with file descriptors. Calling
unp_dispose() will interpret the message as containing file pointers
and crash the system.
This change removes unp_dispose() from this failure path and avoids
using goto to jump into switch statements...
The previous workaround to ignore such messages in unp_scan() is removed.


# 1.110 19-Apr-2008 mjf

If cm->cmsg_len is not valid for unp_internalize do not use it to work out
where the data is in unp_scan.

Fixes PR/38391


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.109 28-Mar-2008 ad

branches: 1.109.2;
Prevent overlapping calls to bind() and/or connect() on a Unix socket.


Revision tags: ad-socklock-base1
# 1.108 24-Mar-2008 yamt

merge yamt-lazymbuf branch.


Revision tags: yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.107 21-Mar-2008 rmind

unp_gc: unlock filelist_lock in a case of restart.


# 1.106 21-Mar-2008 ad

Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.105 25-Jan-2008 ad

branches: 1.105.6;
Remove VOP_LEASE. Discussed on tech-kern.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.104 05-Jan-2008 dsl

Use FILE_LOCK() and FILE_UNLOCK()


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base jmcneill-pm-base
# 1.103 08-Dec-2007 pooka

branches: 1.103.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.


Revision tags: vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase reinoud-bufcleanup-base
# 1.102 26-Nov-2007 pooka

branches: 1.102.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern


Revision tags: jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 vmlocking-base
# 1.101 08-Oct-2007 ad

branches: 1.101.4;
Merge file descriptor locking, cwdi locking and cross-call changes
from the vmlocking branch.


Revision tags: yamt-x86pmap-base2 yamt-x86pmap-base
# 1.100 19-Sep-2007 dyoung

branches: 1.100.2;
Make uipc_ctloutput() return ENOPROTOOPT instead of EINVAL when it
is passed a handle socket-option level that it does not care about.


Revision tags: nick-csl-alignment-base5
# 1.99 09-Aug-2007 he

branches: 1.99.2;
Add a new socket option for unix domain sockets: LOCAL_PEEREID, to make
it possible to get the pid, euid and egid of the process at the remote
end at the time it did bind() or connect().

Add a new libc function, getpeereid() to easily get at the euid and egid.
As a consequence, bump libc's minor number.

Document the LOCAL_PEEREID socket option in unix(4).

Based on contribution by Arne H. Juul, minor modifications by myself.


Revision tags: matt-mips64-base
# 1.98 03-Aug-2007 martin

branches: 1.98.2;
PR kern/32842:
do not leak file descriptors when sending a datagram with SCM_RIGHTS
fails. Patch from Gary Thorpe, based on changes in FreeBSD and work
from Christian Biere.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.97 22-Apr-2007 dsl

branches: 1.97.2; 1.97.6;
Change the way that emulations locate files within the emulation root to
avoid having to allocate space in the 'stackgap'
- which is very LWP unfriendly.
The additional code for non-emulation namei() is trivial, the reduction for
the emulations is massive.
The vnode for a processes emulation root is saved in the cwdi structure
during process exec.
If the emulation root the TRYEMULROOT flag are set, namei() will do an initial
search for absolute pathnames in the emulation root, if that fails it will
retry from the normal root.
".." at the emulation root will always go to the real root, even in the middle
of paths and when expanding symlinks.
Absolute symlinks found using absolute paths in the emulation root will be
relative to the emulation root (so /usr/lib/xxx.so -> /lib/xxx.so links
inside the emulation root don't need changing).
If the root of the emulation would be returned (for an emulation lookup), then
the real root is returned instead (matching the behaviour of emul_lookup,
but being a cheap comparison here) so that programs that scan "../.."
looking for the root dircetory don't loop forever.
The target for symbolic links is no longer mangled (it used to get the
CHECK_ALT_xxx() treatment, so could get /emul/xxx prepended).
CHECK_ALT_xxx() are no more. Most of the change is deleting them, and adding
TRYEMULROOT to the flags to NDINIT().
A lot of the emulation system call stubs could now be deleted.


Revision tags: thorpej-atomic-base
# 1.96 03-Apr-2007 hannken

Remove calls to now obsolete vn_start_write() and vn_finished_write().


# 1.95 04-Mar-2007 christos

branches: 1.95.2; 1.95.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.94 01-Nov-2006 cbiere

branches: 1.94.2; 1.94.4; 1.94.8;
Pointing one element past an array is fine, pointing before it isn't.


Revision tags: yamt-splraiseipl-base2 yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.93 03-Sep-2006 christos

branches: 1.93.2; 1.93.4;
use c99 initializers


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base8 yamt-pdpolicy-base7
# 1.92 23-Jul-2006 ad

Use the LWP cached credentials where sane.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.91 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.90 14-Apr-2006 christos

Coverity CID 1089: Add more KASSERTs to prevent NULL deref.


# 1.89 14-Apr-2006 christos

Coverity CID 1088: Add KASSERT to prevent NULL pointer deref.


# 1.88 13-Apr-2006 matt

Add a KASSERT to document a condition for the PRU_ABORT case.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.87 01-Mar-2006 christos

branches: 1.87.2; 1.87.4; 1.87.6;
PR/32856: Christian Biere: Don't panic if you send a control message with
SCM_RIGHTS on an unconnected stream socket.


# 1.86 11-Dec-2005 christos

branches: 1.86.2; 1.86.4; 1.86.6;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base ktrace-lwp-base
# 1.85 11-Nov-2005 simonb

Call nanotime() directly, instead of doing the
microtime()/TIMEVAL_TO_TIMESPEC() dance.


Revision tags: yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.84 30-Aug-2005 jmmv

Honor the user's umask while creating local sockets. Several other systems
do already this (such as FreeBSD, OpenBSD and Linux), so it will improve
portability of some third-party programs. No objections in tech-kern@.


# 1.83 16-Jun-2005 yamt

branches: 1.83.2;
uipc_usrreq: plug mbuf leak.


# 1.82 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


# 1.81 07-May-2005 christos

PR/30154: YAMAMOTO Takashi: tcp_close locking botch
chgsbsize() as mentioned in the PR can be called from an interrupt context
via tcp_close(). Avoid calling uid_find() in chgsbsize().
- Instead of storing so_uid in struct socketvar, store *so_uidinfo
- Add a simple lock to struct uidinfo.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.80 26-Feb-2005 perry

branches: 1.80.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.79 03-Sep-2004 darrenr

branches: 1.79.4; 1.79.6;
add a per-socket counter for dropped UDP packets when the internal buffers
are full.


# 1.78 22-May-2004 jonathan

Eliminate several uses of `curproc' from the socket-layer code and from NFS.

Add a new explicit `struct proc *p' argument to socreate(), sosend().
Use that argument instead of curproc. Follow-on changes to pass that
argument to socreate(), sosend(), and (*so->so_send)() calls.
These changes reviewed and independently recoded by Matt Thomas.

Changes to soreceive() and (*dom->dom_exernalize() from Matt Thomas:
pass soreceive()'s struct uio* uio->uio_procp to unp_externalize().
Eliminate curproc from unp_externalize. Also, now soreceive() uses
its uio->uio_procp value, pass that same value downward to
((pr->pru_usrreq)() calls for consistency, instead of (struct proc * )0.

Similar changes in sys/nfs to eliminate (most) uses of curproc,
either via the req-> r_procp field of a struct nfsreq *req argument,
or by passing down new explicit struct proc * arguments.

Reviewed by: Matt Thomas, posted to tech-kern.
NB: The (*pr->pru_usrreq)() change should be tested on more (all!) protocols.


# 1.77 18-Apr-2004 matt

Constify sun_noname.


# 1.76 18-Apr-2004 matt

ANSI'fy.


# 1.75 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.74 23-Mar-2004 junyoung

branches: 1.74.2; 1.74.4;
Nuke __P().


# 1.73 29-Dec-2003 martin

Avoid using m_clget() on a mbuf already in use, especially when we
need the data in the mbuf later and m_clget() changes some fields
overlaid to regular mbuf data. Instead, rearange code a bit, create
data into a new allocated buffer and and use MEXTADD to attach it to
the mbuf, if the mbuf internal space is not sufficient.

This fixes a crash on sparc64 (and probably all other archs where
sizeof(int) != sizeof(struct file *)) when running
regress/sys/kern/unfdpass.

Idea for solution from Matt Thomas, with additional input from YAMAMOTO
Takashi.


# 1.72 29-Nov-2003 matt

Restore a change that made AF_LOCAL sockets block on connect(2) until
accepted. However, this time this behavor is not the default. Instead
it must enabled by using the LOCAL_CONNWAIT socket option on either the
connecting or accepting socket.


# 1.71 29-Nov-2003 perry

Revert a change that altered the semantics of AF_LOCAL sockets. Sadly
this made us API incompatible with other Unixes.


# 1.70 15-Oct-2003 hannken

Add the gating of system calls that cause modifications to the underlying
file system.
The function vfs_write_suspend stops all new write operations to a file
system, allows any file system modifying system calls already in progress
to complete, then sync's the file system to disk and returns. The
function vfs_write_resume allows the suspended write operations to
complete.

From FreeBSD with slight modifications.

Approved by: Frank van der Linden <fvdl@netbsd.org>


# 1.69 03-Sep-2003 matt

Fix typo.


# 1.68 03-Sep-2003 matt

Change the behavor of AF_LOCAL connect() to sleep until the server has
accepted the connection. This can prevent a client from overwhelming a
server.


# 1.67 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.66 24-Jul-2003 jdolecek

back rev 1.63 (the linux hack) off - no compat specific code
in generic code, please

we need to massage the passed linux cmsg anyway, linux uses different
alignment for CMSG_DATA on at least some architectures


# 1.65 23-Jul-2003 itojun

backout previous, there was a comment on LINUX_SOL_SOCKET=1


# 1.64 23-Jul-2003 itojun

#define LINUX_SOL_SOCKET 1, so that we can answer "what the hell is this 1?"
at ease.


# 1.63 23-Jul-2003 christos

From Todd Vierling: Accept level == 1 for linux compat.


# 1.62 29-Jun-2003 fvdl

branches: 1.62.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.61 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.60 10-Apr-2003 christos

RP/21088: Jesse Off: Return ENOBUFS instead of EINVAL when sbappend fails.


# 1.59 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


# 1.58 25-Feb-2003 pk

Fix a simple_lock() mismatch in unp_internalize().
We may need to merge the passes over the files contained in the message
as noted by enami tsugutomo on tech-smp.


# 1.57 23-Feb-2003 pk

Make updating a file's reference and use count MP-safe.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.56 25-Nov-2002 itojun

branches: 1.56.2;
no need for error check after MEXTMALLOC - jdolecek


# 1.55 25-Nov-2002 itojun

MEXTMALLOC() can fail even if M_WAITOK, if arg is too big for malloc().


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.54 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base gehenna-devsw-base eeh-devprop-base newlock-base ifpoll-base thorpej-mips-cache-base
# 1.53 12-Nov-2001 lukem

add RCSIDs


# 1.52 18-Oct-2001 thorpej

branches: 1.52.2;
Deprecate the "m_act" alias of "m_nextpkt" (m_act is a historical
name), and just use m_nextpkt everywhere.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.51 14-Jun-2001 thorpej

branches: 1.51.2;
Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.


# 1.50 07-Jun-2001 thorpej

Rework fdalloc() even further: split fdalloc() into fdalloc() and
fdexpand(). The former will return ENOSPC if there is not space
in the current filedesc table. The latter performs the expansion
of the filedesc table. This means that fdalloc() won't ever block,
and it gives callers an opportunity to clean up before the
potentially-blocking fdexpand() call.

Update all fdalloc() callers to deal with the need-to-fdexpand() case.

Rewrite unp_externalize() to use fdalloc() and fdexpand() in a
safe way, using an algorithm suggested by Bill Sommerfeld:
- Use a temporary array of integers to hold the new filedesc table
indexes. This allows us to repeat the loop if necessary.
- Loop through the array of file *'s, assigning them to filedesc table
slots. If fdalloc() indicates expansion is necessary, undo the
assignments we've done so far, expand, and retry the whole process.
- Once all file *'s have been assigned to slots, update the f_msgcount
and unp_rights counters.
- Right before we return, copy the temporary integer array to the message
buffer, and trim the length as before.
Note that once locking is added to the filedesc array, this entire
operation will be `atomic', in that the lock will be held while
file *'s are assigned to embryonic table slots, thus preventing anything
else from using them.


# 1.49 06-Jun-2001 thorpej

Change fdalloc() to return ERESTART if we had to reallocate the
descriptor array, which may have blocked. Change callers of
fdalloc() to restart whatever they\'re doing if this condition
happens. (XXX unp_externalize() needs some work, but that will
be tackled later.)

Change finishdup() to close the descriptor in the `new\' slot if
one exists, and change sys_dup2() accordingly.

Closes a race condition when using kernel-assisted user threads.

While here, garbage-collect UF_MAPPED -- it is not used anywhere.


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base thorpej_scsipi_base
# 1.48 05-Jun-2000 thorpej

branches: 1.48.2; 1.48.4;
Oops, missed a couple of places where CMSG_*() should be used. No
functional change in this case, but the code is now correct.


# 1.47 05-Jun-2000 thorpej

- Fix file descriptor passing AGAIN. This has apparently been broken
on LP64 systems (and probably the SPARC) since the __cmsg_alignbytes()
changes went in.
- Change file descriptor passing to use CMSG_DATA(), not (cm + 1). This
pretty much has to be done in order to make it work properly on LP64,
and considering that it's been broken this long...
- Use CMSG_SPACE() to determine the mbuf length needed for a given
control message, and CMSG_LEN() to stash in the cmsg_len member.


Revision tags: minoura-xpg4dl-base
# 1.46 30-Mar-2000 augustss

branches: 1.46.2;
Get rid of register declarations.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base chs-ubc2-base
# 1.45 17-Jun-1999 thorpej

branches: 1.45.2;
Um, hi, let's initialize pointers before we use them.


# 1.44 05-May-1999 thorpej

Add "use counting" to file entries. When closing a file, and it's reference
count is 0, wait for use count to drain before finishing the close.

This is necessary in order for multiple processes to safely share file
descriptor tables.


# 1.43 05-May-1999 thorpej

Fix alignment problem in the garbage-collection code path.


# 1.42 30-Apr-1999 thorpej

Break cdir/rdir/cmask info out of struct filedesc, and put it in a new
substructure, `cwdinfo'. Implement optional sharing of this substructure.

This is required for clone(2).


# 1.41 21-Apr-1999 mrg

revert previous. oops.


# 1.40 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: netbsd-1-4-base
# 1.39 22-Mar-1999 sommerfe

branches: 1.39.2;
Disallow descriptor-passing of descriptors which are open on
directories which aren't under the recipient's root.

Clean up of many error conditions involving descriptor passing, to
eliminate infinite loops, panics, premature garbage collection of
sockets, and descriptor leaks:
- Avoid letting unp_gc() see descriptors with a refcount of zero by
removing them from the socket's queue before releasing them.
- Avoid socket leak in PRU_ABORT (this will also gc descriptors queued
on a not-yet accepted socket when the accepting socket goes away).
- Put in block comment explaining how unp_gc() should work.
- Correctly manage unp_defer count so we don't get stuck in an infinite
loop with nothing to do.
- Don't tie MARK and DEFER bits so closely together.
- Mark descriptors queued on not-yet-accepted sockets as well.
- Don't call sorflush on non-socket, it doesn't work very well.
- Deal with discard of NULL file pointer.
- Hopefully cause GC to converge faster by only deferring sockets in
unp_mark().


# 1.38 21-Dec-1998 thorpej

In unp_internalize(), add a comment explaining why we must ALIGN() the
data after the cmsghdr when accessing internalized SCM_RIGHTS messages
(i.e. array of struct file *s). The historic interface does not align
the externalized SCM_RIGHTS messages (i.e. array of ints).


# 1.37 21-Dec-1998 thorpej

Fix a fencepost error in unp_scan() which caused a bad pointer deref on
the SPARC platform only (ILP32 but ALIGNBYTES of 7), due to a missing
ALIGN().


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.36 04-Aug-1998 perry

Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)


# 1.35 31-Jul-1998 perry

fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.


Revision tags: eeh-paddr_t-base
# 1.34 18-Jul-1998 lukem

branches: 1.34.2;
use AF_LOCAL instead of AF_UNIX


# 1.33 16-Jul-1998 thorpej

Back out previous, I botched something.


# 1.32 10-Jul-1998 thorpej

For SOCK_STREAM, provide the socket credentials to the accepter as soon as
the client connects.


# 1.31 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.30 07-Jan-1998 thorpej

Implement passing credentials as ancillary data on Unix domain sockets,
enabled with the LOCAL_CREDS socket option on the listener. Semantics are
similar to BSD/OS's:
- Creds are available with first data on SOCK_STREAM, and with every datagram
on SOCK_DGRAM.
- It is not possible to forge credentials.

Different in that:
- Different credential data structure (ours does not rely on the format
of internal kernel data structures, and does not pass the login name).
- We can pass creds and file descriptors at the same time (this does not
work in BSD/OS).

Luke Mewburn <lukem@netbsd.org> gets credit for inspiring me to implement
this. :-)


# 1.29 07-Jan-1998 thorpej

Fix passing of multiple file descriptors (was broken when code was made
64-bit safe).


Revision tags: netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.28 17-Oct-1997 christos

branches: 1.28.2;
PR/4280: Chris Jones: Sending more than one fd over AF_UNIX sockets causes
panic. Bug in the fd -> struct file * conversion...


Revision tags: thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.27 26-Jun-1997 thorpej

Several small changes to eliminate kludginess in dealing with unix domain
socket names:
- In unp_setsockaddr() and unp_setpeeraddr(), if the socket name can't
fit into a single mbuf, allocate enough external storage space to
hold it.
- In unp_bind() and unp_connect(), perform a similar operation, but allocate
one extra byte, and ensure that the pathname is nul-terminated.
Many thanks to enami tsugutomo <enami@cv.sony.co.jp> for the sanity
checking.


# 1.26 24-Jun-1997 thorpej

Eliminate use of dtom() in the handing of UNIX domain sockets. Add an
"unp_addrlen" member to the unpcb, and use it when copying the socket
name. This eliminates that last uses of dtom() in the system.


# 1.25 15-May-1997 kleink

When fstat(2)ing a file descriptor of a local communications domain socket,
fill the socket's creation time into the stat structure's st_[acm]time fields:
POSIX requires this behavior for pipe(2). N.B.: updating the st_[am]time fields
when reading/writing the pipe is neither required nor implemented, though.


# 1.24 10-Apr-1997 cgd

Internalize and externalize file descriptors being passed via local domain
socket control messages correctly, without assuming that sizeof(int) ==
sizeof(pointer). Fixes PR#3183.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.23 23-May-1996 mycroft

Oops. Add missing label.


# 1.22 23-May-1996 mycroft

We can only get a control mbuf for PRU_SEND or PRU_SENDOOB. Add diagnostic
code to panic in this case.


# 1.21 23-May-1996 mycroft

Make sure the control and data mbufs are freed in all cases.


# 1.20 23-May-1996 mycroft

Separate some code into separate functions.
Make unp_addr be a pointer to the sockaddr, not to the mbuf, as with raw
sockets.
Other minor cleanup.


# 1.19 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.18 09-Feb-1996 christos

branches: 1.18.4;
More proto fixes


# 1.17 04-Feb-1996 pk

unp_detach() return type botch.


# 1.16 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.15 17-Aug-1995 mycroft

so_pcb should be a void *.


# 1.14 16-Aug-1995 mycroft

Allocate PCBs with malloc(), more mgetclr(). Be more careful to free the
PCB after it's done with.


# 1.13 05-Apr-1995 mycroft

Add missing argument to closef().


# 1.12 13-Dec-1994 mycroft

LEASE_CHECK -> VOP_LEASE


# 1.11 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.10 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.9 08-Jun-1994 mycroft

Update to 4.4-Lite fs code.


# 1.8 04-May-1994 mycroft

Fix panic when closing a file descriptor on which access rights have been sent
but not received.


# 1.7 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.6 14-Sep-1993 mycroft

Fix from Noriyuki Soda <soda@sra.co.jp>:
recvmsg(2) always returns -1 with errno==EMSGSIZE, when trying
to pass file descriptors through UNIX domain socket.


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.5 27-Jun-1993 andrew

branches: 1.5.4;
ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.4 12-Jun-1993 andrew

Yuval Yarom's 'panic:closef: count < 0' fix to unp_discard().


# 1.3 22-May-1993 cgd

add include of select.h if necessary for protos, or delete if extraneous


# 1.2 18-May-1993 cgd

make kernel select interface be one-stop shopping & clean it all up.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


Revision tags: tls-maxphys-base-20171202
# 1.182 02-Dec-2017 mrg

include opt_compat_netbsd.h, so that eg COMPAT_70 will be set.


Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107 nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.181 31-Oct-2016 maxv

Memory leak, found by Mootja. It is easily triggerable from userland.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.180 06-Apr-2016 roy

branches: 1.180.2;
Add sc_pid to sockcred so that SOCK_DGRAM and LOCAL_CREDS socket option
can learn the process id of the AF_LOCAL sender.
Add compat glue for old structure.


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606
# 1.179 02-May-2015 rtr

make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}


# 1.178 26-Apr-2015 rtr

remove pr_generic from struct pr_usrreqs and all implementations of
pr_generic in protocols.

bump to 7.99.13

approved by rmind@


# 1.177 24-Apr-2015 rtr

make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19


Revision tags: nick-nhusb-base-20150406
# 1.176 03-Apr-2015 rtr

* change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@


# 1.175 01-Mar-2015 christos

PR/39918: ITOH Yasufumi: Replace KASSERT with continue, since the file
descriptor can be closed since closef() does not pay attention to FDEFER.
XXX: Pullup-7


# 1.174 28-Feb-2015 rtr

Place opening brace to function at column 0, like in the rest of the file.


# 1.173 02-Feb-2015 christos

Handle LOCAL_PEERID for socketpair() connected sockets which connect through
connect2().
1. move the code that sets the peerid structure into connect1(). This
handles so2. The datagram code calls connect2 twice with flipped
so arguments so both sockets get set.
2. in connect2 copy the peerid structure from so2 to so, so that that
both stream sockets get set.


Revision tags: nick-nhusb-base
# 1.172 08-Oct-2014 taca

branches: 1.172.2;
Make behavior of getsockname(2) (and maybe getpeername(2)) as the same as
NetBSD 6.1_STABLE and other operating system (OS X 10.9.5).

* sa_len of sockaddr_un strucrure is always set to sizeof(sun_path).
* pathname stored in sun_path is alwasys '\0' terminated (except length
of sun_path is sizeof(sun_path)?).

Should be fix PR kern/49247, runtime problem of lmtp service of dovecot2 on
NetBSD current and NetBSD 7.0_BETA.


# 1.171 05-Sep-2014 matt

Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.


# 1.170 05-Sep-2014 matt

Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 09-Aug-2014 rtr

branches: 1.169.2;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@


# 1.168 08-Aug-2014 rtr

split PRU_RCVD function out of pr_generic() usrreq switches and put into
separate functions

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_RCVD with calls to
pr_rcvd()


# 1.167 05-Aug-2014 rtr

actually use the passed in struct lwp *l instead of curlwp in unp_connect()


# 1.166 05-Aug-2014 rtr

split PRU_SEND function out of pr_generic() usrreq switches and put into
separate functions

xxx_send(struct socket *, struct mbuf *, struct mbuf *,
struct mbuf *, struct lwp *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_SEND with calls to
pr_send()

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_send() PRUs

- l2cap_send() -> l2cap_send_pcb()
- sco_send() -> sco_send_pcb()
- rfcomm_send() -> rfcomm_send_pcb()

patch reviewed by rmind


# 1.165 05-Aug-2014 rtr

revert the removal of struct lwp * parameter from bind, listen and connect
user requests.

this should resolve the issue relating to nfs client hangs presented
recently by wiz on current-users@


# 1.164 31-Jul-2014 rtr

* remove declarations of unp_bind, unp_discard, unp_disconnect1, unp_drop,
unp_shutdown1, unp_internalize and unp_output functions from sys/un.h
and instead declare them as static in uipc_usrreq.c with prototype
declarations as necessary.

* remove struct lwp * parameter from unp_output() while here and just
use curlwp instead.

as discussed with rmind


# 1.163 31-Jul-2014 rtr

split PRU_DISCONNECT, PRU_SHUTDOWN and PRU_ABORT function out of
pr_generic() usrreq switches and put into separate functions

xxx_disconnect(struct socket *)
xxx_shutdown(struct socket *)
xxx_abort(struct socket *)

- always KASSERT(solocked(so)) even if not implemented
- replace calls to pr_generic() with req =
PRU_{DISCONNECT,SHUTDOWN,ABORT}
with calls to pr_{disconnect,shutdown,abort}() respectively

rename existing internal functions used to implement above functionality
to permit use of the names for xxx_{disconnect,shutdown,abort}().

- {l2cap,sco,rfcomm}_disconnect() ->
{l2cap,sco,rfcomm}_disconnect_pcb()
- {unp,rip,tcp}_disconnect() -> {unp,rip,tcp}_disconnect1()
- unp_shutdown() -> unp_shutdown1()

patch reviewed by rmind


# 1.162 30-Jul-2014 rtr

split PRU_CONNECT function out of pr_generic() usrreq switches and put
into seaparate functions

xxx_listen(struct socket *, struct mbuf *)

- always KASSERT(solocked(so)) and KASSERT(nam != NULL)
- replace calls to pr_generic() with req = PRU_CONNECT with
pr_connect()
- rename existin {l2cap,sco,rfcomm}_connect() to
{l2cap,sco,rfcomm}_connect_pcb() respectively to permit
naming consistency with other protocols functions.
- drop struct lwp * parameter from unp_connect() and at_pcbconnect()
and use curlwp instead where appropriate.

patch reviewed by rmind


# 1.161 24-Jul-2014 rtr

split PRU_BIND and PRU_LISTEN function out of pr_generic() usrreq
switches and put into separate functions
xxx_bind(struct socket *, struct mbuf *)
xxx_listen(struct socket *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_BIND with call to
pr_bind()

- replace calls to pr_generic() with req = PRU_LISTEN with call to
pr_listen()

- drop struct lwp * parameter from at_pcbsetaddr(), in_pcbbind() and
unp_bind() and always use curlwp.

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_{bind,listen}() PRUs

- l2cap_{bind,listen}() -> l2cap_{bind,listen}_pcb()
- sco_{bind,listen}() -> sco_{bind,listen}_pcb()
- rfcomm_{bind,listen}() -> rfcomm_{bind,listen}_pcb()

patch reviewed by rmind

welcome to netbsd 6.99.48


# 1.160 23-Jul-2014 rtr

split PRU_SENDOOB and PRU_RCVOOB function out of pr_generic() usrreq
switches and put into separate functions
xxx_sendoob(struct socket *, struct mbuf *, struct mbuf *)
xxx_recvoob(struct socket *, struct mbuf *, int)

- always KASSERT(solocked(so)) even if request is not implemented

- replace calls to pr_generic() with req = PRU_{SEND,RCV}OOB with
calls to pr_{send,recv}oob() respectively.

there is still some tweaking of m_freem(m) and m_freem(control) to come
for consistency. not performed with this commit for clarity.

reviewed by rmind


# 1.159 09-Jul-2014 rtr

* split PRU_ACCEPT function out of pr_generic() usrreq switches and put
into a separate function xxx_accept(struct socket *, struct mbuf *)

note: future cleanup will take place to remove struct mbuf parameter
type and replace it with a more appropriate type.

patch reviewed by rmind


# 1.158 09-Jul-2014 rtr

* split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47


# 1.157 07-Jul-2014 rtr

* sprinkle KASSERT(solocked(so)); in all pr_stat() functions.
* fix remaining inconsistent struct socket parameter names.


# 1.156 06-Jul-2014 rtr

* split PRU_SENSE functionality out of xxx_usrreq() switches and place into
separate xxx_stat(struct socket *, struct stat *) functions.
* replace calls using pr_generic with req == PRU_SENSE with pr_stat().

further change will follow that cleans up the pattern used to extract the
pcb and test for its presence.

reviewed by rmind


# 1.155 01-Jul-2014 rtr

fix parameter types in pr_ioctl, called xx_control() functions and remove
abuse of pointer to struct mbuf type.

param2 changed to u_long type and uses parameter name 'cmd' (ioctl command)
param3 changed to void * type and uses parameter name 'data'
param4 changed to struct ifnet * and uses parameter name 'ifp'
param5 has been removed (formerly struct lwp *) and uses of 'l' have been
replaced with curlwp from curproc(9).

callers have had (now unnecessary) casts to struct mbuf * removed, called
code has had (now unnecessary) casts to u_long, void * and struct ifnet *
respectively removed.

reviewed by rmind@


# 1.154 22-Jun-2014 rtr

* split PRU_CONTROL functionality out of xxx_userreq() switches and place
into separate xxx_ioctl() functions.
* place KASSERT(req != PRU_CONTROL) inside xxx_userreq() as it is now
inappropriate for req = PRU_CONTROL in xxx_userreq().
* replace calls to pr_generic() with req = PRU_CONTROL with pr_ioctl().
* remove & fixup references to PRU_CONTROL xxx_userreq() function comments.
* fix various comments references for xxx_userreq() that mentioned
PRU_CONTROL as xxx_userreq() no longer handles the request.

a further change will follow to fix parameter and naming inconsistencies
retained from original code.

Reviewed by rmind@


# 1.153 08-Jun-2014 christos

Handle race where the server closed the socket between us 'connecting' and
sending data.


# 1.152 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.151 18-May-2014 rmind

Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.150 23-Jan-2014 hannken

branches: 1.150.2;
Change vnode operations create, mknod, mkdir and symlink to return
the resulting vnode *vpp unlocked.

Discussed on tech-kern@

Welcome to 6.99.30


# 1.149 17-Jan-2014 hannken

Change vnode operations create, mknod, mkdir and symlink to keep the
directory node dvp locked on return.

Discussed on tech-kern@

Welcome to 6.99.29


# 1.148 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.147 25-Oct-2013 martin

Eliminat an unused variable and simplify the KASSERT which used to use it


# 1.146 08-Oct-2013 christos

Centralize the sockaddr_un allocation code. Set sun_len appropriately so
that the address length returned is correct, not always 106. Note that
we do things slightly differently than linux and explain why. Unit-tests
to come.


# 1.145 08-Oct-2013 christos

- Instead of having accept(2) return a zero-filled sockaddr for the case
where accept(2) was called on a unix socket that called connect(2) and
then close(2), before the connection was accepted, return the empty
sockaddr_un.
- Fix the length of the empty sockaddr_un socket so that it reflects reality.


# 1.144 29-Aug-2013 rmind

Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.


# 1.143 01-Aug-2013 drochner

In unp_externalize, don't do anything if an SCM_RIGHTS control message
was sent with zero file descriptors in it. Otherwise, a zero-length
temporary storage would be allocated which triggers panic on DIAGNOSTIC
kernels (but is harmless for release kernels).
reviewed by Taylor R Campbell


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.142 27-Jun-2013 christos

branches: 1.142.2;
use sbcreatecontrol1() and m_add() instead of open-coding everything, and
getting it slightly wrong.


Revision tags: agc-symver-base
# 1.141 14-Feb-2013 riastradh

Fix some screw cases in cmsg file descriptor passing.

- Don't leave garbage in the control buffer if allocating file
descriptors fails in unp_externalize.

- Scrub the space between CMSG_LEN and CMSG_SPACE to avoid kernel
memory disclosure in unp_externalize.

- Don't read past cmsg_len when closing file descriptors that
couldn't get delivered, in free_rights.

ok christos


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.140 06-Oct-2012 christos

Avoid crash dereferencing a NULL fp in fd_affix() in unp_externalize
caused by the sequence of passing two fd's with two sendmsg()'s,
then doing a read() and a recvmsg(). The read() calls dom_dispose()
which discards both messages in the mbuf, and sets the fp's in the
array to NULL. Linux dequeues only one message per read() so the
second recvmsg() gets the fd from the second message. This fix
just avoids the NULL pointer de-reference, making the second
recvmsg() to fail. It is dubious to pass fd's with stream sockets
and expect mixing read() and recvmsg() to work. Plus processing
one control message per read() changes the current semantics and
should be examined before applied. In addition there is a race between
dom_externalize() and dom_dispose(): what happens in a multi-threaded
network stack when one thread disposes where the other externalizes
the same array?

NB: Pullup to 6.


# 1.139 30-Jul-2012 christos

branches: 1.139.2;
remove infinite loop on error, extra parens on return.


# 1.138 30-Jul-2012 christos

simplify unp_externalize(), some from gimpy, some from me.


# 1.137 02-Jun-2012 martin

Stopgap fix for PR kern/46463: disallow passing of kqueue descriptors
via SCM_RIGHT anxiliary socket messages.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.136 26-Jun-2011 christos

branches: 1.136.2; 1.136.8;
* Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).

- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.

* Fix the system calls that take socklen_t arguments to actually do so.

* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).

* Bump libc version for the new syscalls.


# 1.135 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.134 29-May-2011 manu

branches: 1.134.2;
Add SOCK_SEQPACKET to PL_LOCAL sockets. Based on patch from Jesse Off,
submitted 8 years ago:
http://mail-index.netbsd.org/tech-kern/2003/04/14/0006.html


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.133 19-Nov-2010 dholland

branches: 1.133.2;
Introduce struct pathbuf. This is an abstraction to hold a pathname
and the metadata required to interpret it. Callers of namei must now
create a pathbuf and pass it to NDINIT (instead of a string and a
uio_seg), then destroy the pathbuf after the namei session is
complete.

Update all namei call sites accordingly. Add a pathbuf(9) man page and
update namei(9).

The pathbuf interface also now appears in a couple of related
additional places that were passing string/uio_seg pairs that were
later fed into NDINIT. Update other call sites accordingly.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3
# 1.132 21-Oct-2010 yamt

unp_connect: fix an assertion


# 1.131 21-Oct-2010 yamt

unp_connect2: fix a comment.


Revision tags: yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.130 24-Jun-2010 hannken

Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.129 09-Feb-2010 wiz

branches: 1.129.2;
Fix typo in comment.


Revision tags: uebayasi-xip-base
# 1.128 08-Jan-2010 pooka

branches: 1.128.2;
The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change


Revision tags: matt-premerge-20091211 yamt-nfs-mp-base8 jym-xensuspend-nbase
# 1.127 26-Aug-2009 bouyer

In uipc_usrreq(PRU_ACCEPT), grab the unp_streamlock before unp_setpeerlocks().
This fixes a race where, for a short period of time, so->so_lock and
so2->so_lock are not sync. This makes solocked2() and solocked()
unreliable and cause DIAGNOSTIC kernel panics. This also fixes a possible
panic in unp_setaddr() which expects the socket locked.
Should fix kern/38968, fix proposed in
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005863.html


Revision tags: yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.126 24-May-2009 ad

More changes to improve kern_descrip.c.

- Avoid atomics in more places.
- Remove the per-descriptor mutex, and just use filedesc_t::fd_lock.
It was only being used to synchronize close, and in any case we needed
to take fd_lock to free the descriptor slot.
- Optimize certain paths for the <NDFDFILE case.
- Sprinkle more comments and assertions.
- Cache more stuff in filedesc_t.
- Fix numerous minor bugs spotted along the way.
- Restructure how the open files array is maintained, for clarity and so
that we can eliminate the membar_consumer() call in fd_getfile(). This is
mostly syntactic sugar; the main functional change is that fd_nfiles now
lives alongside the open file array.

Some measurements with libmicro:

- simple file syscalls are like close() are between 1 to 10% faster.
- some nice improvements, e.g. poll(1000) which is ~50% faster.


Revision tags: yamt-nfs-mp-base4 yamt-nfs-mp-base3 jym-xensuspend-base
# 1.125 04-May-2009 yamt

tweak some assertions on so_head to make them more meaningful.


Revision tags: nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.124 09-Apr-2009 yamt

0 -> NULL


# 1.123 09-Apr-2009 yamt

remove an unnecessary cast.


# 1.122 09-Apr-2009 yamt

0 -> NULL where appropriate


# 1.121 11-Mar-2009 mrg

completely rework the way that orphaned sockets that are being fdpassed
via SCM_RIGHTS messages are dealt with:

1. unp_gc: make this a kthread.

2. unp_detach: go not call unp_gc directly. instead, wake up unp_gc kthread.

3. unp_scan: do not close files here. instead, put them on a global list
for unp_gc to close, along with a per-file "deferred close count". if
file is already enqueued for close, just increment deferred close count.
this eliminates the recursive calls.

3. unp_gc: scan files on global deferred close list. close each file N
times, as specified by deferred close count in file. continue processing
list until it becomes empty (closing may cause additional files to be
queued for close).

4. unp_gc: add additional bit to mark files we are scanning. set during
initial scan of global file list that currently clears FMARK/FDEFER.
during later scans, never examine / garbage collect descriptors that
we have not marked during the earlier scan. do not proceed with this
initial scan until all deferred closes have been processed. be careful
with locking to ensure no races are introduced between deferred close
and file scan.

5. unp_gc: use dummy file_t to mark position in list when scanning. allow
us to drop filelist_lock. in turn allows us to eliminate kmem_alloc()
and safely close files, etc.

6. prohibit transfer of descriptors within SCM_RIGHTS messages if
(num_files_in_transit > maxfiles / unp_rights_ratio)

7. fd_allocfile: ensure recycled filse don't get scanned.


this is 97% work done by andrew doran, with a couple of minor bug fixes
and a lot of testing by yours truly.


Revision tags: nick-hppapmap-base2
# 1.120 08-Feb-2009 pooka

branches: 1.120.2;
Don't try to fd_putfile() descriptors we didn't manage to fd_getfile().

Fixes local DoS panic described in kern/40570.


Revision tags: netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.119 11-Oct-2008 pooka

branches: 1.119.2; 1.119.4;
Move uidinfo to its own module in kern_uidinfo.c and include in rump.
No functional change to uidinfo.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.118 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.117 20-Jun-2008 christos

branches: 1.117.2;
Also enforce that cm->cmsg_len >= CMSG_ALIGN(sizeof cmsghdr), from
Michael van Elst


# 1.116 20-Jun-2008 christos

Don't require cm->cmsg_len == control->m_len, just that the cm->cmsg_len
<= control->m_len, like FreeBSD does. Idea from Taylor R Campbell.


Revision tags: yamt-pf42-base4
# 1.115 10-Jun-2008 ad

There can be existing waiters on a socket's condition variables when we
change socket::so_lock, and they rely on the old lock to synchronize.
Wake them up whenever we change so_lock so they can restart their waits.


Revision tags: yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.114 28-Apr-2008 martin

branches: 1.114.2; 1.114.4;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.113 27-Apr-2008 ad

branches: 1.113.2;
Add a comment.


# 1.112 24-Apr-2008 ad

Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.111 20-Apr-2008 mlelstv

When unp_internalize fails (due to the sanity check or an out-of-memory
condition), it leaves the control message with file descriptors. Calling
unp_dispose() will interpret the message as containing file pointers
and crash the system.
This change removes unp_dispose() from this failure path and avoids
using goto to jump into switch statements...
The previous workaround to ignore such messages in unp_scan() is removed.


# 1.110 19-Apr-2008 mjf

If cm->cmsg_len is not valid for unp_internalize do not use it to work out
where the data is in unp_scan.

Fixes PR/38391


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.109 28-Mar-2008 ad

branches: 1.109.2;
Prevent overlapping calls to bind() and/or connect() on a Unix socket.


Revision tags: ad-socklock-base1
# 1.108 24-Mar-2008 yamt

merge yamt-lazymbuf branch.


Revision tags: yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.107 21-Mar-2008 rmind

unp_gc: unlock filelist_lock in a case of restart.


# 1.106 21-Mar-2008 ad

Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.105 25-Jan-2008 ad

branches: 1.105.6;
Remove VOP_LEASE. Discussed on tech-kern.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.104 05-Jan-2008 dsl

Use FILE_LOCK() and FILE_UNLOCK()


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base jmcneill-pm-base
# 1.103 08-Dec-2007 pooka

branches: 1.103.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.


Revision tags: vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase reinoud-bufcleanup-base
# 1.102 26-Nov-2007 pooka

branches: 1.102.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern


Revision tags: jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 vmlocking-base
# 1.101 08-Oct-2007 ad

branches: 1.101.4;
Merge file descriptor locking, cwdi locking and cross-call changes
from the vmlocking branch.


Revision tags: yamt-x86pmap-base2 yamt-x86pmap-base
# 1.100 19-Sep-2007 dyoung

branches: 1.100.2;
Make uipc_ctloutput() return ENOPROTOOPT instead of EINVAL when it
is passed a handle socket-option level that it does not care about.


Revision tags: nick-csl-alignment-base5
# 1.99 09-Aug-2007 he

branches: 1.99.2;
Add a new socket option for unix domain sockets: LOCAL_PEEREID, to make
it possible to get the pid, euid and egid of the process at the remote
end at the time it did bind() or connect().

Add a new libc function, getpeereid() to easily get at the euid and egid.
As a consequence, bump libc's minor number.

Document the LOCAL_PEEREID socket option in unix(4).

Based on contribution by Arne H. Juul, minor modifications by myself.


Revision tags: matt-mips64-base
# 1.98 03-Aug-2007 martin

branches: 1.98.2;
PR kern/32842:
do not leak file descriptors when sending a datagram with SCM_RIGHTS
fails. Patch from Gary Thorpe, based on changes in FreeBSD and work
from Christian Biere.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.97 22-Apr-2007 dsl

branches: 1.97.2; 1.97.6;
Change the way that emulations locate files within the emulation root to
avoid having to allocate space in the 'stackgap'
- which is very LWP unfriendly.
The additional code for non-emulation namei() is trivial, the reduction for
the emulations is massive.
The vnode for a processes emulation root is saved in the cwdi structure
during process exec.
If the emulation root the TRYEMULROOT flag are set, namei() will do an initial
search for absolute pathnames in the emulation root, if that fails it will
retry from the normal root.
".." at the emulation root will always go to the real root, even in the middle
of paths and when expanding symlinks.
Absolute symlinks found using absolute paths in the emulation root will be
relative to the emulation root (so /usr/lib/xxx.so -> /lib/xxx.so links
inside the emulation root don't need changing).
If the root of the emulation would be returned (for an emulation lookup), then
the real root is returned instead (matching the behaviour of emul_lookup,
but being a cheap comparison here) so that programs that scan "../.."
looking for the root dircetory don't loop forever.
The target for symbolic links is no longer mangled (it used to get the
CHECK_ALT_xxx() treatment, so could get /emul/xxx prepended).
CHECK_ALT_xxx() are no more. Most of the change is deleting them, and adding
TRYEMULROOT to the flags to NDINIT().
A lot of the emulation system call stubs could now be deleted.


Revision tags: thorpej-atomic-base
# 1.96 03-Apr-2007 hannken

Remove calls to now obsolete vn_start_write() and vn_finished_write().


# 1.95 04-Mar-2007 christos

branches: 1.95.2; 1.95.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.94 01-Nov-2006 cbiere

branches: 1.94.2; 1.94.4; 1.94.8;
Pointing one element past an array is fine, pointing before it isn't.


Revision tags: yamt-splraiseipl-base2 yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.93 03-Sep-2006 christos

branches: 1.93.2; 1.93.4;
use c99 initializers


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base8 yamt-pdpolicy-base7
# 1.92 23-Jul-2006 ad

Use the LWP cached credentials where sane.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.91 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.90 14-Apr-2006 christos

Coverity CID 1089: Add more KASSERTs to prevent NULL deref.


# 1.89 14-Apr-2006 christos

Coverity CID 1088: Add KASSERT to prevent NULL pointer deref.


# 1.88 13-Apr-2006 matt

Add a KASSERT to document a condition for the PRU_ABORT case.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.87 01-Mar-2006 christos

branches: 1.87.2; 1.87.4; 1.87.6;
PR/32856: Christian Biere: Don't panic if you send a control message with
SCM_RIGHTS on an unconnected stream socket.


# 1.86 11-Dec-2005 christos

branches: 1.86.2; 1.86.4; 1.86.6;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base ktrace-lwp-base
# 1.85 11-Nov-2005 simonb

Call nanotime() directly, instead of doing the
microtime()/TIMEVAL_TO_TIMESPEC() dance.


Revision tags: yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.84 30-Aug-2005 jmmv

Honor the user's umask while creating local sockets. Several other systems
do already this (such as FreeBSD, OpenBSD and Linux), so it will improve
portability of some third-party programs. No objections in tech-kern@.


# 1.83 16-Jun-2005 yamt

branches: 1.83.2;
uipc_usrreq: plug mbuf leak.


# 1.82 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


# 1.81 07-May-2005 christos

PR/30154: YAMAMOTO Takashi: tcp_close locking botch
chgsbsize() as mentioned in the PR can be called from an interrupt context
via tcp_close(). Avoid calling uid_find() in chgsbsize().
- Instead of storing so_uid in struct socketvar, store *so_uidinfo
- Add a simple lock to struct uidinfo.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.80 26-Feb-2005 perry

branches: 1.80.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.79 03-Sep-2004 darrenr

branches: 1.79.4; 1.79.6;
add a per-socket counter for dropped UDP packets when the internal buffers
are full.


# 1.78 22-May-2004 jonathan

Eliminate several uses of `curproc' from the socket-layer code and from NFS.

Add a new explicit `struct proc *p' argument to socreate(), sosend().
Use that argument instead of curproc. Follow-on changes to pass that
argument to socreate(), sosend(), and (*so->so_send)() calls.
These changes reviewed and independently recoded by Matt Thomas.

Changes to soreceive() and (*dom->dom_exernalize() from Matt Thomas:
pass soreceive()'s struct uio* uio->uio_procp to unp_externalize().
Eliminate curproc from unp_externalize. Also, now soreceive() uses
its uio->uio_procp value, pass that same value downward to
((pr->pru_usrreq)() calls for consistency, instead of (struct proc * )0.

Similar changes in sys/nfs to eliminate (most) uses of curproc,
either via the req-> r_procp field of a struct nfsreq *req argument,
or by passing down new explicit struct proc * arguments.

Reviewed by: Matt Thomas, posted to tech-kern.
NB: The (*pr->pru_usrreq)() change should be tested on more (all!) protocols.


# 1.77 18-Apr-2004 matt

Constify sun_noname.


# 1.76 18-Apr-2004 matt

ANSI'fy.


# 1.75 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.74 23-Mar-2004 junyoung

branches: 1.74.2; 1.74.4;
Nuke __P().


# 1.73 29-Dec-2003 martin

Avoid using m_clget() on a mbuf already in use, especially when we
need the data in the mbuf later and m_clget() changes some fields
overlaid to regular mbuf data. Instead, rearange code a bit, create
data into a new allocated buffer and and use MEXTADD to attach it to
the mbuf, if the mbuf internal space is not sufficient.

This fixes a crash on sparc64 (and probably all other archs where
sizeof(int) != sizeof(struct file *)) when running
regress/sys/kern/unfdpass.

Idea for solution from Matt Thomas, with additional input from YAMAMOTO
Takashi.


# 1.72 29-Nov-2003 matt

Restore a change that made AF_LOCAL sockets block on connect(2) until
accepted. However, this time this behavor is not the default. Instead
it must enabled by using the LOCAL_CONNWAIT socket option on either the
connecting or accepting socket.


# 1.71 29-Nov-2003 perry

Revert a change that altered the semantics of AF_LOCAL sockets. Sadly
this made us API incompatible with other Unixes.


# 1.70 15-Oct-2003 hannken

Add the gating of system calls that cause modifications to the underlying
file system.
The function vfs_write_suspend stops all new write operations to a file
system, allows any file system modifying system calls already in progress
to complete, then sync's the file system to disk and returns. The
function vfs_write_resume allows the suspended write operations to
complete.

From FreeBSD with slight modifications.

Approved by: Frank van der Linden <fvdl@netbsd.org>


# 1.69 03-Sep-2003 matt

Fix typo.


# 1.68 03-Sep-2003 matt

Change the behavor of AF_LOCAL connect() to sleep until the server has
accepted the connection. This can prevent a client from overwhelming a
server.


# 1.67 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.66 24-Jul-2003 jdolecek

back rev 1.63 (the linux hack) off - no compat specific code
in generic code, please

we need to massage the passed linux cmsg anyway, linux uses different
alignment for CMSG_DATA on at least some architectures


# 1.65 23-Jul-2003 itojun

backout previous, there was a comment on LINUX_SOL_SOCKET=1


# 1.64 23-Jul-2003 itojun

#define LINUX_SOL_SOCKET 1, so that we can answer "what the hell is this 1?"
at ease.


# 1.63 23-Jul-2003 christos

From Todd Vierling: Accept level == 1 for linux compat.


# 1.62 29-Jun-2003 fvdl

branches: 1.62.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.61 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.60 10-Apr-2003 christos

RP/21088: Jesse Off: Return ENOBUFS instead of EINVAL when sbappend fails.


# 1.59 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


# 1.58 25-Feb-2003 pk

Fix a simple_lock() mismatch in unp_internalize().
We may need to merge the passes over the files contained in the message
as noted by enami tsugutomo on tech-smp.


# 1.57 23-Feb-2003 pk

Make updating a file's reference and use count MP-safe.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.56 25-Nov-2002 itojun

branches: 1.56.2;
no need for error check after MEXTMALLOC - jdolecek


# 1.55 25-Nov-2002 itojun

MEXTMALLOC() can fail even if M_WAITOK, if arg is too big for malloc().


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.54 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base gehenna-devsw-base eeh-devprop-base newlock-base ifpoll-base thorpej-mips-cache-base
# 1.53 12-Nov-2001 lukem

add RCSIDs


# 1.52 18-Oct-2001 thorpej

branches: 1.52.2;
Deprecate the "m_act" alias of "m_nextpkt" (m_act is a historical
name), and just use m_nextpkt everywhere.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.51 14-Jun-2001 thorpej

branches: 1.51.2;
Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.


# 1.50 07-Jun-2001 thorpej

Rework fdalloc() even further: split fdalloc() into fdalloc() and
fdexpand(). The former will return ENOSPC if there is not space
in the current filedesc table. The latter performs the expansion
of the filedesc table. This means that fdalloc() won't ever block,
and it gives callers an opportunity to clean up before the
potentially-blocking fdexpand() call.

Update all fdalloc() callers to deal with the need-to-fdexpand() case.

Rewrite unp_externalize() to use fdalloc() and fdexpand() in a
safe way, using an algorithm suggested by Bill Sommerfeld:
- Use a temporary array of integers to hold the new filedesc table
indexes. This allows us to repeat the loop if necessary.
- Loop through the array of file *'s, assigning them to filedesc table
slots. If fdalloc() indicates expansion is necessary, undo the
assignments we've done so far, expand, and retry the whole process.
- Once all file *'s have been assigned to slots, update the f_msgcount
and unp_rights counters.
- Right before we return, copy the temporary integer array to the message
buffer, and trim the length as before.
Note that once locking is added to the filedesc array, this entire
operation will be `atomic', in that the lock will be held while
file *'s are assigned to embryonic table slots, thus preventing anything
else from using them.


# 1.49 06-Jun-2001 thorpej

Change fdalloc() to return ERESTART if we had to reallocate the
descriptor array, which may have blocked. Change callers of
fdalloc() to restart whatever they\'re doing if this condition
happens. (XXX unp_externalize() needs some work, but that will
be tackled later.)

Change finishdup() to close the descriptor in the `new\' slot if
one exists, and change sys_dup2() accordingly.

Closes a race condition when using kernel-assisted user threads.

While here, garbage-collect UF_MAPPED -- it is not used anywhere.


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base thorpej_scsipi_base
# 1.48 05-Jun-2000 thorpej

branches: 1.48.2; 1.48.4;
Oops, missed a couple of places where CMSG_*() should be used. No
functional change in this case, but the code is now correct.


# 1.47 05-Jun-2000 thorpej

- Fix file descriptor passing AGAIN. This has apparently been broken
on LP64 systems (and probably the SPARC) since the __cmsg_alignbytes()
changes went in.
- Change file descriptor passing to use CMSG_DATA(), not (cm + 1). This
pretty much has to be done in order to make it work properly on LP64,
and considering that it's been broken this long...
- Use CMSG_SPACE() to determine the mbuf length needed for a given
control message, and CMSG_LEN() to stash in the cmsg_len member.


Revision tags: minoura-xpg4dl-base
# 1.46 30-Mar-2000 augustss

branches: 1.46.2;
Get rid of register declarations.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base chs-ubc2-base
# 1.45 17-Jun-1999 thorpej

branches: 1.45.2;
Um, hi, let's initialize pointers before we use them.


# 1.44 05-May-1999 thorpej

Add "use counting" to file entries. When closing a file, and it's reference
count is 0, wait for use count to drain before finishing the close.

This is necessary in order for multiple processes to safely share file
descriptor tables.


# 1.43 05-May-1999 thorpej

Fix alignment problem in the garbage-collection code path.


# 1.42 30-Apr-1999 thorpej

Break cdir/rdir/cmask info out of struct filedesc, and put it in a new
substructure, `cwdinfo'. Implement optional sharing of this substructure.

This is required for clone(2).


# 1.41 21-Apr-1999 mrg

revert previous. oops.


# 1.40 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: netbsd-1-4-base
# 1.39 22-Mar-1999 sommerfe

branches: 1.39.2;
Disallow descriptor-passing of descriptors which are open on
directories which aren't under the recipient's root.

Clean up of many error conditions involving descriptor passing, to
eliminate infinite loops, panics, premature garbage collection of
sockets, and descriptor leaks:
- Avoid letting unp_gc() see descriptors with a refcount of zero by
removing them from the socket's queue before releasing them.
- Avoid socket leak in PRU_ABORT (this will also gc descriptors queued
on a not-yet accepted socket when the accepting socket goes away).
- Put in block comment explaining how unp_gc() should work.
- Correctly manage unp_defer count so we don't get stuck in an infinite
loop with nothing to do.
- Don't tie MARK and DEFER bits so closely together.
- Mark descriptors queued on not-yet-accepted sockets as well.
- Don't call sorflush on non-socket, it doesn't work very well.
- Deal with discard of NULL file pointer.
- Hopefully cause GC to converge faster by only deferring sockets in
unp_mark().


# 1.38 21-Dec-1998 thorpej

In unp_internalize(), add a comment explaining why we must ALIGN() the
data after the cmsghdr when accessing internalized SCM_RIGHTS messages
(i.e. array of struct file *s). The historic interface does not align
the externalized SCM_RIGHTS messages (i.e. array of ints).


# 1.37 21-Dec-1998 thorpej

Fix a fencepost error in unp_scan() which caused a bad pointer deref on
the SPARC platform only (ILP32 but ALIGNBYTES of 7), due to a missing
ALIGN().


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.36 04-Aug-1998 perry

Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)


# 1.35 31-Jul-1998 perry

fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.


Revision tags: eeh-paddr_t-base
# 1.34 18-Jul-1998 lukem

branches: 1.34.2;
use AF_LOCAL instead of AF_UNIX


# 1.33 16-Jul-1998 thorpej

Back out previous, I botched something.


# 1.32 10-Jul-1998 thorpej

For SOCK_STREAM, provide the socket credentials to the accepter as soon as
the client connects.


# 1.31 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.30 07-Jan-1998 thorpej

Implement passing credentials as ancillary data on Unix domain sockets,
enabled with the LOCAL_CREDS socket option on the listener. Semantics are
similar to BSD/OS's:
- Creds are available with first data on SOCK_STREAM, and with every datagram
on SOCK_DGRAM.
- It is not possible to forge credentials.

Different in that:
- Different credential data structure (ours does not rely on the format
of internal kernel data structures, and does not pass the login name).
- We can pass creds and file descriptors at the same time (this does not
work in BSD/OS).

Luke Mewburn <lukem@netbsd.org> gets credit for inspiring me to implement
this. :-)


# 1.29 07-Jan-1998 thorpej

Fix passing of multiple file descriptors (was broken when code was made
64-bit safe).


Revision tags: netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.28 17-Oct-1997 christos

branches: 1.28.2;
PR/4280: Chris Jones: Sending more than one fd over AF_UNIX sockets causes
panic. Bug in the fd -> struct file * conversion...


Revision tags: thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.27 26-Jun-1997 thorpej

Several small changes to eliminate kludginess in dealing with unix domain
socket names:
- In unp_setsockaddr() and unp_setpeeraddr(), if the socket name can't
fit into a single mbuf, allocate enough external storage space to
hold it.
- In unp_bind() and unp_connect(), perform a similar operation, but allocate
one extra byte, and ensure that the pathname is nul-terminated.
Many thanks to enami tsugutomo <enami@cv.sony.co.jp> for the sanity
checking.


# 1.26 24-Jun-1997 thorpej

Eliminate use of dtom() in the handing of UNIX domain sockets. Add an
"unp_addrlen" member to the unpcb, and use it when copying the socket
name. This eliminates that last uses of dtom() in the system.


# 1.25 15-May-1997 kleink

When fstat(2)ing a file descriptor of a local communications domain socket,
fill the socket's creation time into the stat structure's st_[acm]time fields:
POSIX requires this behavior for pipe(2). N.B.: updating the st_[am]time fields
when reading/writing the pipe is neither required nor implemented, though.


# 1.24 10-Apr-1997 cgd

Internalize and externalize file descriptors being passed via local domain
socket control messages correctly, without assuming that sizeof(int) ==
sizeof(pointer). Fixes PR#3183.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.23 23-May-1996 mycroft

Oops. Add missing label.


# 1.22 23-May-1996 mycroft

We can only get a control mbuf for PRU_SEND or PRU_SENDOOB. Add diagnostic
code to panic in this case.


# 1.21 23-May-1996 mycroft

Make sure the control and data mbufs are freed in all cases.


# 1.20 23-May-1996 mycroft

Separate some code into separate functions.
Make unp_addr be a pointer to the sockaddr, not to the mbuf, as with raw
sockets.
Other minor cleanup.


# 1.19 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.18 09-Feb-1996 christos

branches: 1.18.4;
More proto fixes


# 1.17 04-Feb-1996 pk

unp_detach() return type botch.


# 1.16 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.15 17-Aug-1995 mycroft

so_pcb should be a void *.


# 1.14 16-Aug-1995 mycroft

Allocate PCBs with malloc(), more mgetclr(). Be more careful to free the
PCB after it's done with.


# 1.13 05-Apr-1995 mycroft

Add missing argument to closef().


# 1.12 13-Dec-1994 mycroft

LEASE_CHECK -> VOP_LEASE


# 1.11 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.10 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.9 08-Jun-1994 mycroft

Update to 4.4-Lite fs code.


# 1.8 04-May-1994 mycroft

Fix panic when closing a file descriptor on which access rights have been sent
but not received.


# 1.7 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.6 14-Sep-1993 mycroft

Fix from Noriyuki Soda <soda@sra.co.jp>:
recvmsg(2) always returns -1 with errno==EMSGSIZE, when trying
to pass file descriptors through UNIX domain socket.


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.5 27-Jun-1993 andrew

branches: 1.5.4;
ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.4 12-Jun-1993 andrew

Yuval Yarom's 'panic:closef: count < 0' fix to unp_discard().


# 1.3 22-May-1993 cgd

add include of select.h if necessary for protos, or delete if extraneous


# 1.2 18-May-1993 cgd

make kernel select interface be one-stop shopping & clean it all up.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.181 31-Oct-2016 maxv

Memory leak, found by Mootja. It is easily triggerable from userland.


Revision tags: nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529 nick-nhusb-base-20160422
# 1.180 06-Apr-2016 roy

branches: 1.180.2;
Add sc_pid to sockcred so that SOCK_DGRAM and LOCAL_CREDS socket option
can learn the process id of the AF_LOCAL sender.
Add compat glue for old structure.


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921 nick-nhusb-base-20150606
# 1.179 02-May-2015 rtr

make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}


# 1.178 26-Apr-2015 rtr

remove pr_generic from struct pr_usrreqs and all implementations of
pr_generic in protocols.

bump to 7.99.13

approved by rmind@


# 1.177 24-Apr-2015 rtr

make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19


Revision tags: nick-nhusb-base-20150406
# 1.176 03-Apr-2015 rtr

* change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@


# 1.175 01-Mar-2015 christos

PR/39918: ITOH Yasufumi: Replace KASSERT with continue, since the file
descriptor can be closed since closef() does not pay attention to FDEFER.
XXX: Pullup-7


# 1.174 28-Feb-2015 rtr

Place opening brace to function at column 0, like in the rest of the file.


# 1.173 02-Feb-2015 christos

Handle LOCAL_PEERID for socketpair() connected sockets which connect through
connect2().
1. move the code that sets the peerid structure into connect1(). This
handles so2. The datagram code calls connect2 twice with flipped
so arguments so both sockets get set.
2. in connect2 copy the peerid structure from so2 to so, so that that
both stream sockets get set.


Revision tags: nick-nhusb-base
# 1.172 08-Oct-2014 taca

branches: 1.172.2;
Make behavior of getsockname(2) (and maybe getpeername(2)) as the same as
NetBSD 6.1_STABLE and other operating system (OS X 10.9.5).

* sa_len of sockaddr_un strucrure is always set to sizeof(sun_path).
* pathname stored in sun_path is alwasys '\0' terminated (except length
of sun_path is sizeof(sun_path)?).

Should be fix PR kern/49247, runtime problem of lmtp service of dovecot2 on
NetBSD current and NetBSD 7.0_BETA.


# 1.171 05-Sep-2014 matt

Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.


# 1.170 05-Sep-2014 matt

Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.


Revision tags: netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 09-Aug-2014 rtr

branches: 1.169.2;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@


# 1.168 08-Aug-2014 rtr

split PRU_RCVD function out of pr_generic() usrreq switches and put into
separate functions

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_RCVD with calls to
pr_rcvd()


# 1.167 05-Aug-2014 rtr

actually use the passed in struct lwp *l instead of curlwp in unp_connect()


# 1.166 05-Aug-2014 rtr

split PRU_SEND function out of pr_generic() usrreq switches and put into
separate functions

xxx_send(struct socket *, struct mbuf *, struct mbuf *,
struct mbuf *, struct lwp *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_SEND with calls to
pr_send()

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_send() PRUs

- l2cap_send() -> l2cap_send_pcb()
- sco_send() -> sco_send_pcb()
- rfcomm_send() -> rfcomm_send_pcb()

patch reviewed by rmind


# 1.165 05-Aug-2014 rtr

revert the removal of struct lwp * parameter from bind, listen and connect
user requests.

this should resolve the issue relating to nfs client hangs presented
recently by wiz on current-users@


# 1.164 31-Jul-2014 rtr

* remove declarations of unp_bind, unp_discard, unp_disconnect1, unp_drop,
unp_shutdown1, unp_internalize and unp_output functions from sys/un.h
and instead declare them as static in uipc_usrreq.c with prototype
declarations as necessary.

* remove struct lwp * parameter from unp_output() while here and just
use curlwp instead.

as discussed with rmind


# 1.163 31-Jul-2014 rtr

split PRU_DISCONNECT, PRU_SHUTDOWN and PRU_ABORT function out of
pr_generic() usrreq switches and put into separate functions

xxx_disconnect(struct socket *)
xxx_shutdown(struct socket *)
xxx_abort(struct socket *)

- always KASSERT(solocked(so)) even if not implemented
- replace calls to pr_generic() with req =
PRU_{DISCONNECT,SHUTDOWN,ABORT}
with calls to pr_{disconnect,shutdown,abort}() respectively

rename existing internal functions used to implement above functionality
to permit use of the names for xxx_{disconnect,shutdown,abort}().

- {l2cap,sco,rfcomm}_disconnect() ->
{l2cap,sco,rfcomm}_disconnect_pcb()
- {unp,rip,tcp}_disconnect() -> {unp,rip,tcp}_disconnect1()
- unp_shutdown() -> unp_shutdown1()

patch reviewed by rmind


# 1.162 30-Jul-2014 rtr

split PRU_CONNECT function out of pr_generic() usrreq switches and put
into seaparate functions

xxx_listen(struct socket *, struct mbuf *)

- always KASSERT(solocked(so)) and KASSERT(nam != NULL)
- replace calls to pr_generic() with req = PRU_CONNECT with
pr_connect()
- rename existin {l2cap,sco,rfcomm}_connect() to
{l2cap,sco,rfcomm}_connect_pcb() respectively to permit
naming consistency with other protocols functions.
- drop struct lwp * parameter from unp_connect() and at_pcbconnect()
and use curlwp instead where appropriate.

patch reviewed by rmind


# 1.161 24-Jul-2014 rtr

split PRU_BIND and PRU_LISTEN function out of pr_generic() usrreq
switches and put into separate functions
xxx_bind(struct socket *, struct mbuf *)
xxx_listen(struct socket *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_BIND with call to
pr_bind()

- replace calls to pr_generic() with req = PRU_LISTEN with call to
pr_listen()

- drop struct lwp * parameter from at_pcbsetaddr(), in_pcbbind() and
unp_bind() and always use curlwp.

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_{bind,listen}() PRUs

- l2cap_{bind,listen}() -> l2cap_{bind,listen}_pcb()
- sco_{bind,listen}() -> sco_{bind,listen}_pcb()
- rfcomm_{bind,listen}() -> rfcomm_{bind,listen}_pcb()

patch reviewed by rmind

welcome to netbsd 6.99.48


# 1.160 23-Jul-2014 rtr

split PRU_SENDOOB and PRU_RCVOOB function out of pr_generic() usrreq
switches and put into separate functions
xxx_sendoob(struct socket *, struct mbuf *, struct mbuf *)
xxx_recvoob(struct socket *, struct mbuf *, int)

- always KASSERT(solocked(so)) even if request is not implemented

- replace calls to pr_generic() with req = PRU_{SEND,RCV}OOB with
calls to pr_{send,recv}oob() respectively.

there is still some tweaking of m_freem(m) and m_freem(control) to come
for consistency. not performed with this commit for clarity.

reviewed by rmind


# 1.159 09-Jul-2014 rtr

* split PRU_ACCEPT function out of pr_generic() usrreq switches and put
into a separate function xxx_accept(struct socket *, struct mbuf *)

note: future cleanup will take place to remove struct mbuf parameter
type and replace it with a more appropriate type.

patch reviewed by rmind


# 1.158 09-Jul-2014 rtr

* split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47


# 1.157 07-Jul-2014 rtr

* sprinkle KASSERT(solocked(so)); in all pr_stat() functions.
* fix remaining inconsistent struct socket parameter names.


# 1.156 06-Jul-2014 rtr

* split PRU_SENSE functionality out of xxx_usrreq() switches and place into
separate xxx_stat(struct socket *, struct stat *) functions.
* replace calls using pr_generic with req == PRU_SENSE with pr_stat().

further change will follow that cleans up the pattern used to extract the
pcb and test for its presence.

reviewed by rmind


# 1.155 01-Jul-2014 rtr

fix parameter types in pr_ioctl, called xx_control() functions and remove
abuse of pointer to struct mbuf type.

param2 changed to u_long type and uses parameter name 'cmd' (ioctl command)
param3 changed to void * type and uses parameter name 'data'
param4 changed to struct ifnet * and uses parameter name 'ifp'
param5 has been removed (formerly struct lwp *) and uses of 'l' have been
replaced with curlwp from curproc(9).

callers have had (now unnecessary) casts to struct mbuf * removed, called
code has had (now unnecessary) casts to u_long, void * and struct ifnet *
respectively removed.

reviewed by rmind@


# 1.154 22-Jun-2014 rtr

* split PRU_CONTROL functionality out of xxx_userreq() switches and place
into separate xxx_ioctl() functions.
* place KASSERT(req != PRU_CONTROL) inside xxx_userreq() as it is now
inappropriate for req = PRU_CONTROL in xxx_userreq().
* replace calls to pr_generic() with req = PRU_CONTROL with pr_ioctl().
* remove & fixup references to PRU_CONTROL xxx_userreq() function comments.
* fix various comments references for xxx_userreq() that mentioned
PRU_CONTROL as xxx_userreq() no longer handles the request.

a further change will follow to fix parameter and naming inconsistencies
retained from original code.

Reviewed by rmind@


# 1.153 08-Jun-2014 christos

Handle race where the server closed the socket between us 'connecting' and
sending data.


# 1.152 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.151 18-May-2014 rmind

Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.150 23-Jan-2014 hannken

branches: 1.150.2;
Change vnode operations create, mknod, mkdir and symlink to return
the resulting vnode *vpp unlocked.

Discussed on tech-kern@

Welcome to 6.99.30


# 1.149 17-Jan-2014 hannken

Change vnode operations create, mknod, mkdir and symlink to keep the
directory node dvp locked on return.

Discussed on tech-kern@

Welcome to 6.99.29


# 1.148 29-Oct-2013 hannken

Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25


# 1.147 25-Oct-2013 martin

Eliminat an unused variable and simplify the KASSERT which used to use it


# 1.146 08-Oct-2013 christos

Centralize the sockaddr_un allocation code. Set sun_len appropriately so
that the address length returned is correct, not always 106. Note that
we do things slightly differently than linux and explain why. Unit-tests
to come.


# 1.145 08-Oct-2013 christos

- Instead of having accept(2) return a zero-filled sockaddr for the case
where accept(2) was called on a unix socket that called connect(2) and
then close(2), before the connection was accepted, return the empty
sockaddr_un.
- Fix the length of the empty sockaddr_un socket so that it reflects reality.


# 1.144 29-Aug-2013 rmind

Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.


# 1.143 01-Aug-2013 drochner

In unp_externalize, don't do anything if an SCM_RIGHTS control message
was sent with zero file descriptors in it. Otherwise, a zero-length
temporary storage would be allocated which triggers panic on DIAGNOSTIC
kernels (but is harmless for release kernels).
reviewed by Taylor R Campbell


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.142 27-Jun-2013 christos

branches: 1.142.2;
use sbcreatecontrol1() and m_add() instead of open-coding everything, and
getting it slightly wrong.


Revision tags: agc-symver-base
# 1.141 14-Feb-2013 riastradh

Fix some screw cases in cmsg file descriptor passing.

- Don't leave garbage in the control buffer if allocating file
descriptors fails in unp_externalize.

- Scrub the space between CMSG_LEN and CMSG_SPACE to avoid kernel
memory disclosure in unp_externalize.

- Don't read past cmsg_len when closing file descriptors that
couldn't get delivered, in free_rights.

ok christos


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.140 06-Oct-2012 christos

Avoid crash dereferencing a NULL fp in fd_affix() in unp_externalize
caused by the sequence of passing two fd's with two sendmsg()'s,
then doing a read() and a recvmsg(). The read() calls dom_dispose()
which discards both messages in the mbuf, and sets the fp's in the
array to NULL. Linux dequeues only one message per read() so the
second recvmsg() gets the fd from the second message. This fix
just avoids the NULL pointer de-reference, making the second
recvmsg() to fail. It is dubious to pass fd's with stream sockets
and expect mixing read() and recvmsg() to work. Plus processing
one control message per read() changes the current semantics and
should be examined before applied. In addition there is a race between
dom_externalize() and dom_dispose(): what happens in a multi-threaded
network stack when one thread disposes where the other externalizes
the same array?

NB: Pullup to 6.


# 1.139 30-Jul-2012 christos

branches: 1.139.2;
remove infinite loop on error, extra parens on return.


# 1.138 30-Jul-2012 christos

simplify unp_externalize(), some from gimpy, some from me.


# 1.137 02-Jun-2012 martin

Stopgap fix for PR kern/46463: disallow passing of kqueue descriptors
via SCM_RIGHT anxiliary socket messages.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.136 26-Jun-2011 christos

branches: 1.136.2; 1.136.8;
* Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).

- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.

* Fix the system calls that take socklen_t arguments to actually do so.

* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).

* Bump libc version for the new syscalls.


# 1.135 12-Jun-2011 rmind

Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.134 29-May-2011 manu

branches: 1.134.2;
Add SOCK_SEQPACKET to PL_LOCAL sockets. Based on patch from Jesse Off,
submitted 8 years ago:
http://mail-index.netbsd.org/tech-kern/2003/04/14/0006.html


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.133 19-Nov-2010 dholland

branches: 1.133.2;
Introduce struct pathbuf. This is an abstraction to hold a pathname
and the metadata required to interpret it. Callers of namei must now
create a pathbuf and pass it to NDINIT (instead of a string and a
uio_seg), then destroy the pathbuf after the namei session is
complete.

Update all namei call sites accordingly. Add a pathbuf(9) man page and
update namei(9).

The pathbuf interface also now appears in a couple of related
additional places that were passing string/uio_seg pairs that were
later fed into NDINIT. Update other call sites accordingly.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3
# 1.132 21-Oct-2010 yamt

unp_connect: fix an assertion


# 1.131 21-Oct-2010 yamt

unp_connect2: fix a comment.


Revision tags: yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.130 24-Jun-2010 hannken

Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9
# 1.129 09-Feb-2010 wiz

branches: 1.129.2;
Fix typo in comment.


Revision tags: uebayasi-xip-base
# 1.128 08-Jan-2010 pooka

branches: 1.128.2;
The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change


Revision tags: matt-premerge-20091211 yamt-nfs-mp-base8 jym-xensuspend-nbase
# 1.127 26-Aug-2009 bouyer

In uipc_usrreq(PRU_ACCEPT), grab the unp_streamlock before unp_setpeerlocks().
This fixes a race where, for a short period of time, so->so_lock and
so2->so_lock are not sync. This makes solocked2() and solocked()
unreliable and cause DIAGNOSTIC kernel panics. This also fixes a possible
panic in unp_setaddr() which expects the socket locked.
Should fix kern/38968, fix proposed in
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005863.html


Revision tags: yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.126 24-May-2009 ad

More changes to improve kern_descrip.c.

- Avoid atomics in more places.
- Remove the per-descriptor mutex, and just use filedesc_t::fd_lock.
It was only being used to synchronize close, and in any case we needed
to take fd_lock to free the descriptor slot.
- Optimize certain paths for the <NDFDFILE case.
- Sprinkle more comments and assertions.
- Cache more stuff in filedesc_t.
- Fix numerous minor bugs spotted along the way.
- Restructure how the open files array is maintained, for clarity and so
that we can eliminate the membar_consumer() call in fd_getfile(). This is
mostly syntactic sugar; the main functional change is that fd_nfiles now
lives alongside the open file array.

Some measurements with libmicro:

- simple file syscalls are like close() are between 1 to 10% faster.
- some nice improvements, e.g. poll(1000) which is ~50% faster.


Revision tags: yamt-nfs-mp-base4 yamt-nfs-mp-base3 jym-xensuspend-base
# 1.125 04-May-2009 yamt

tweak some assertions on so_head to make them more meaningful.


Revision tags: nick-hppapmap-base4 nick-hppapmap-base3 nick-hppapmap-base
# 1.124 09-Apr-2009 yamt

0 -> NULL


# 1.123 09-Apr-2009 yamt

remove an unnecessary cast.


# 1.122 09-Apr-2009 yamt

0 -> NULL where appropriate


# 1.121 11-Mar-2009 mrg

completely rework the way that orphaned sockets that are being fdpassed
via SCM_RIGHTS messages are dealt with:

1. unp_gc: make this a kthread.

2. unp_detach: go not call unp_gc directly. instead, wake up unp_gc kthread.

3. unp_scan: do not close files here. instead, put them on a global list
for unp_gc to close, along with a per-file "deferred close count". if
file is already enqueued for close, just increment deferred close count.
this eliminates the recursive calls.

3. unp_gc: scan files on global deferred close list. close each file N
times, as specified by deferred close count in file. continue processing
list until it becomes empty (closing may cause additional files to be
queued for close).

4. unp_gc: add additional bit to mark files we are scanning. set during
initial scan of global file list that currently clears FMARK/FDEFER.
during later scans, never examine / garbage collect descriptors that
we have not marked during the earlier scan. do not proceed with this
initial scan until all deferred closes have been processed. be careful
with locking to ensure no races are introduced between deferred close
and file scan.

5. unp_gc: use dummy file_t to mark position in list when scanning. allow
us to drop filelist_lock. in turn allows us to eliminate kmem_alloc()
and safely close files, etc.

6. prohibit transfer of descriptors within SCM_RIGHTS messages if
(num_files_in_transit > maxfiles / unp_rights_ratio)

7. fd_allocfile: ensure recycled filse don't get scanned.


this is 97% work done by andrew doran, with a couple of minor bug fixes
and a lot of testing by yours truly.


Revision tags: nick-hppapmap-base2
# 1.120 08-Feb-2009 pooka

branches: 1.120.2;
Don't try to fd_putfile() descriptors we didn't manage to fd_getfile().

Fixes local DoS panic described in kern/40570.


Revision tags: netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.119 11-Oct-2008 pooka

branches: 1.119.2; 1.119.4;
Move uidinfo to its own module in kern_uidinfo.c and include in rump.
No functional change to uidinfo.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.118 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.117 20-Jun-2008 christos

branches: 1.117.2;
Also enforce that cm->cmsg_len >= CMSG_ALIGN(sizeof cmsghdr), from
Michael van Elst


# 1.116 20-Jun-2008 christos

Don't require cm->cmsg_len == control->m_len, just that the cm->cmsg_len
<= control->m_len, like FreeBSD does. Idea from Taylor R Campbell.


Revision tags: yamt-pf42-base4
# 1.115 10-Jun-2008 ad

There can be existing waiters on a socket's condition variables when we
change socket::so_lock, and they rely on the old lock to synchronize.
Wake them up whenever we change so_lock so they can restart their waits.


Revision tags: yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2
# 1.114 28-Apr-2008 martin

branches: 1.114.2; 1.114.4;
Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.113 27-Apr-2008 ad

branches: 1.113.2;
Add a comment.


# 1.112 24-Apr-2008 ad

Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.111 20-Apr-2008 mlelstv

When unp_internalize fails (due to the sanity check or an out-of-memory
condition), it leaves the control message with file descriptors. Calling
unp_dispose() will interpret the message as containing file pointers
and crash the system.
This change removes unp_dispose() from this failure path and avoids
using goto to jump into switch statements...
The previous workaround to ignore such messages in unp_scan() is removed.


# 1.110 19-Apr-2008 mjf

If cm->cmsg_len is not valid for unp_internalize do not use it to work out
where the data is in unp_scan.

Fixes PR/38391


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.109 28-Mar-2008 ad

branches: 1.109.2;
Prevent overlapping calls to bind() and/or connect() on a Unix socket.


Revision tags: ad-socklock-base1
# 1.108 24-Mar-2008 yamt

merge yamt-lazymbuf branch.


Revision tags: yamt-lazymbuf-base15 yamt-lazymbuf-base14
# 1.107 21-Mar-2008 rmind

unp_gc: unlock filelist_lock in a case of restart.


# 1.106 21-Mar-2008 ad

Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.


Revision tags: keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.105 25-Jan-2008 ad

branches: 1.105.6;
Remove VOP_LEASE. Discussed on tech-kern.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-base
# 1.104 05-Jan-2008 dsl

Use FILE_LOCK() and FILE_UNLOCK()


Revision tags: vmlocking2-base3 yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base jmcneill-pm-base
# 1.103 08-Dec-2007 pooka

branches: 1.103.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.


Revision tags: vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase reinoud-bufcleanup-base
# 1.102 26-Nov-2007 pooka

branches: 1.102.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern


Revision tags: jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 vmlocking-base
# 1.101 08-Oct-2007 ad

branches: 1.101.4;
Merge file descriptor locking, cwdi locking and cross-call changes
from the vmlocking branch.


Revision tags: yamt-x86pmap-base2 yamt-x86pmap-base
# 1.100 19-Sep-2007 dyoung

branches: 1.100.2;
Make uipc_ctloutput() return ENOPROTOOPT instead of EINVAL when it
is passed a handle socket-option level that it does not care about.


Revision tags: nick-csl-alignment-base5
# 1.99 09-Aug-2007 he

branches: 1.99.2;
Add a new socket option for unix domain sockets: LOCAL_PEEREID, to make
it possible to get the pid, euid and egid of the process at the remote
end at the time it did bind() or connect().

Add a new libc function, getpeereid() to easily get at the euid and egid.
As a consequence, bump libc's minor number.

Document the LOCAL_PEEREID socket option in unix(4).

Based on contribution by Arne H. Juul, minor modifications by myself.


Revision tags: matt-mips64-base
# 1.98 03-Aug-2007 martin

branches: 1.98.2;
PR kern/32842:
do not leak file descriptors when sending a datagram with SCM_RIGHTS
fails. Patch from Gary Thorpe, based on changes in FreeBSD and work
from Christian Biere.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.97 22-Apr-2007 dsl

branches: 1.97.2; 1.97.6;
Change the way that emulations locate files within the emulation root to
avoid having to allocate space in the 'stackgap'
- which is very LWP unfriendly.
The additional code for non-emulation namei() is trivial, the reduction for
the emulations is massive.
The vnode for a processes emulation root is saved in the cwdi structure
during process exec.
If the emulation root the TRYEMULROOT flag are set, namei() will do an initial
search for absolute pathnames in the emulation root, if that fails it will
retry from the normal root.
".." at the emulation root will always go to the real root, even in the middle
of paths and when expanding symlinks.
Absolute symlinks found using absolute paths in the emulation root will be
relative to the emulation root (so /usr/lib/xxx.so -> /lib/xxx.so links
inside the emulation root don't need changing).
If the root of the emulation would be returned (for an emulation lookup), then
the real root is returned instead (matching the behaviour of emul_lookup,
but being a cheap comparison here) so that programs that scan "../.."
looking for the root dircetory don't loop forever.
The target for symbolic links is no longer mangled (it used to get the
CHECK_ALT_xxx() treatment, so could get /emul/xxx prepended).
CHECK_ALT_xxx() are no more. Most of the change is deleting them, and adding
TRYEMULROOT to the flags to NDINIT().
A lot of the emulation system call stubs could now be deleted.


Revision tags: thorpej-atomic-base
# 1.96 03-Apr-2007 hannken

Remove calls to now obsolete vn_start_write() and vn_finished_write().


# 1.95 04-Mar-2007 christos

branches: 1.95.2; 1.95.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 yamt-splraiseipl-base3 newlock2-base netbsd-4-base
# 1.94 01-Nov-2006 cbiere

branches: 1.94.2; 1.94.4; 1.94.8;
Pointing one element past an array is fine, pointing before it isn't.


Revision tags: yamt-splraiseipl-base2 yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.93 03-Sep-2006 christos

branches: 1.93.2; 1.93.4;
use c99 initializers


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base8 yamt-pdpolicy-base7
# 1.92 23-Jul-2006 ad

Use the LWP cached credentials where sane.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base
# 1.91 14-May-2006 elad

integrate kauth.


Revision tags: elad-kernelauth-base
# 1.90 14-Apr-2006 christos

Coverity CID 1089: Add more KASSERTs to prevent NULL deref.


# 1.89 14-Apr-2006 christos

Coverity CID 1088: Add KASSERT to prevent NULL pointer deref.


# 1.88 13-Apr-2006 matt

Add a KASSERT to document a condition for the PRU_ABORT case.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.87 01-Mar-2006 christos

branches: 1.87.2; 1.87.4; 1.87.6;
PR/32856: Christian Biere: Don't panic if you send a control message with
SCM_RIGHTS on an unconnected stream socket.


# 1.86 11-Dec-2005 christos

branches: 1.86.2; 1.86.4; 1.86.6;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base ktrace-lwp-base
# 1.85 11-Nov-2005 simonb

Call nanotime() directly, instead of doing the
microtime()/TIMEVAL_TO_TIMESPEC() dance.


Revision tags: yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.84 30-Aug-2005 jmmv

Honor the user's umask while creating local sockets. Several other systems
do already this (such as FreeBSD, OpenBSD and Linux), so it will improve
portability of some third-party programs. No objections in tech-kern@.


# 1.83 16-Jun-2005 yamt

branches: 1.83.2;
uipc_usrreq: plug mbuf leak.


# 1.82 29-May-2005 christos

- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.


# 1.81 07-May-2005 christos

PR/30154: YAMAMOTO Takashi: tcp_close locking botch
chgsbsize() as mentioned in the PR can be called from an interrupt context
via tcp_close(). Avoid calling uid_find() in chgsbsize().
- Instead of storing so_uid in struct socketvar, store *so_uidinfo
- Add a simple lock to struct uidinfo.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.80 26-Feb-2005 perry

branches: 1.80.2;
nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.79 03-Sep-2004 darrenr

branches: 1.79.4; 1.79.6;
add a per-socket counter for dropped UDP packets when the internal buffers
are full.


# 1.78 22-May-2004 jonathan

Eliminate several uses of `curproc' from the socket-layer code and from NFS.

Add a new explicit `struct proc *p' argument to socreate(), sosend().
Use that argument instead of curproc. Follow-on changes to pass that
argument to socreate(), sosend(), and (*so->so_send)() calls.
These changes reviewed and independently recoded by Matt Thomas.

Changes to soreceive() and (*dom->dom_exernalize() from Matt Thomas:
pass soreceive()'s struct uio* uio->uio_procp to unp_externalize().
Eliminate curproc from unp_externalize. Also, now soreceive() uses
its uio->uio_procp value, pass that same value downward to
((pr->pru_usrreq)() calls for consistency, instead of (struct proc * )0.

Similar changes in sys/nfs to eliminate (most) uses of curproc,
either via the req-> r_procp field of a struct nfsreq *req argument,
or by passing down new explicit struct proc * arguments.

Reviewed by: Matt Thomas, posted to tech-kern.
NB: The (*pr->pru_usrreq)() change should be tested on more (all!) protocols.


# 1.77 18-Apr-2004 matt

Constify sun_noname.


# 1.76 18-Apr-2004 matt

ANSI'fy.


# 1.75 17-Apr-2004 christos

PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.


Revision tags: netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.74 23-Mar-2004 junyoung

branches: 1.74.2; 1.74.4;
Nuke __P().


# 1.73 29-Dec-2003 martin

Avoid using m_clget() on a mbuf already in use, especially when we
need the data in the mbuf later and m_clget() changes some fields
overlaid to regular mbuf data. Instead, rearange code a bit, create
data into a new allocated buffer and and use MEXTADD to attach it to
the mbuf, if the mbuf internal space is not sufficient.

This fixes a crash on sparc64 (and probably all other archs where
sizeof(int) != sizeof(struct file *)) when running
regress/sys/kern/unfdpass.

Idea for solution from Matt Thomas, with additional input from YAMAMOTO
Takashi.


# 1.72 29-Nov-2003 matt

Restore a change that made AF_LOCAL sockets block on connect(2) until
accepted. However, this time this behavor is not the default. Instead
it must enabled by using the LOCAL_CONNWAIT socket option on either the
connecting or accepting socket.


# 1.71 29-Nov-2003 perry

Revert a change that altered the semantics of AF_LOCAL sockets. Sadly
this made us API incompatible with other Unixes.


# 1.70 15-Oct-2003 hannken

Add the gating of system calls that cause modifications to the underlying
file system.
The function vfs_write_suspend stops all new write operations to a file
system, allows any file system modifying system calls already in progress
to complete, then sync's the file system to disk and returns. The
function vfs_write_resume allows the suspended write operations to
complete.

From FreeBSD with slight modifications.

Approved by: Frank van der Linden <fvdl@netbsd.org>


# 1.69 03-Sep-2003 matt

Fix typo.


# 1.68 03-Sep-2003 matt

Change the behavor of AF_LOCAL connect() to sleep until the server has
accepted the connection. This can prevent a client from overwhelming a
server.


# 1.67 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.66 24-Jul-2003 jdolecek

back rev 1.63 (the linux hack) off - no compat specific code
in generic code, please

we need to massage the passed linux cmsg anyway, linux uses different
alignment for CMSG_DATA on at least some architectures


# 1.65 23-Jul-2003 itojun

backout previous, there was a comment on LINUX_SOL_SOCKET=1


# 1.64 23-Jul-2003 itojun

#define LINUX_SOL_SOCKET 1, so that we can answer "what the hell is this 1?"
at ease.


# 1.63 23-Jul-2003 christos

From Todd Vierling: Accept level == 1 for linux compat.


# 1.62 29-Jun-2003 fvdl

branches: 1.62.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.61 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.60 10-Apr-2003 christos

RP/21088: Jesse Off: Return ENOBUFS instead of EINVAL when sbappend fails.


# 1.59 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


# 1.58 25-Feb-2003 pk

Fix a simple_lock() mismatch in unp_internalize().
We may need to merge the passes over the files contained in the message
as noted by enami tsugutomo on tech-smp.


# 1.57 23-Feb-2003 pk

Make updating a file's reference and use count MP-safe.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.56 25-Nov-2002 itojun

branches: 1.56.2;
no need for error check after MEXTMALLOC - jdolecek


# 1.55 25-Nov-2002 itojun

MEXTMALLOC() can fail even if M_WAITOK, if arg is too big for malloc().


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.54 04-Sep-2002 matt

Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.


Revision tags: netbsd-1-6-PATCH002-RELEASE netbsd-1-6-PATCH002 netbsd-1-6-PATCH002-RC4 netbsd-1-6-PATCH002-RC3 netbsd-1-6-PATCH002-RC2 netbsd-1-6-PATCH002-RC1 netbsd-1-6-PATCH001 netbsd-1-6-PATCH001-RELEASE netbsd-1-6-PATCH001-RC3 netbsd-1-6-PATCH001-RC2 netbsd-1-6-PATCH001-RC1 netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base gehenna-devsw-base eeh-devprop-base newlock-base ifpoll-base thorpej-mips-cache-base
# 1.53 12-Nov-2001 lukem

add RCSIDs


# 1.52 18-Oct-2001 thorpej

branches: 1.52.2;
Deprecate the "m_act" alias of "m_nextpkt" (m_act is a historical
name), and just use m_nextpkt everywhere.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.51 14-Jun-2001 thorpej

branches: 1.51.2;
Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.


# 1.50 07-Jun-2001 thorpej

Rework fdalloc() even further: split fdalloc() into fdalloc() and
fdexpand(). The former will return ENOSPC if there is not space
in the current filedesc table. The latter performs the expansion
of the filedesc table. This means that fdalloc() won't ever block,
and it gives callers an opportunity to clean up before the
potentially-blocking fdexpand() call.

Update all fdalloc() callers to deal with the need-to-fdexpand() case.

Rewrite unp_externalize() to use fdalloc() and fdexpand() in a
safe way, using an algorithm suggested by Bill Sommerfeld:
- Use a temporary array of integers to hold the new filedesc table
indexes. This allows us to repeat the loop if necessary.
- Loop through the array of file *'s, assigning them to filedesc table
slots. If fdalloc() indicates expansion is necessary, undo the
assignments we've done so far, expand, and retry the whole process.
- Once all file *'s have been assigned to slots, update the f_msgcount
and unp_rights counters.
- Right before we return, copy the temporary integer array to the message
buffer, and trim the length as before.
Note that once locking is added to the filedesc array, this entire
operation will be `atomic', in that the lock will be held while
file *'s are assigned to embryonic table slots, thus preventing anything
else from using them.


# 1.49 06-Jun-2001 thorpej

Change fdalloc() to return ERESTART if we had to reallocate the
descriptor array, which may have blocked. Change callers of
fdalloc() to restart whatever they\'re doing if this condition
happens. (XXX unp_externalize() needs some work, but that will
be tackled later.)

Change finishdup() to close the descriptor in the `new\' slot if
one exists, and change sys_dup2() accordingly.

Closes a race condition when using kernel-assisted user threads.

While here, garbage-collect UF_MAPPED -- it is not used anywhere.


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base thorpej_scsipi_base
# 1.48 05-Jun-2000 thorpej

branches: 1.48.2; 1.48.4;
Oops, missed a couple of places where CMSG_*() should be used. No
functional change in this case, but the code is now correct.


# 1.47 05-Jun-2000 thorpej

- Fix file descriptor passing AGAIN. This has apparently been broken
on LP64 systems (and probably the SPARC) since the __cmsg_alignbytes()
changes went in.
- Change file descriptor passing to use CMSG_DATA(), not (cm + 1). This
pretty much has to be done in order to make it work properly on LP64,
and considering that it's been broken this long...
- Use CMSG_SPACE() to determine the mbuf length needed for a given
control message, and CMSG_LEN() to stash in the cmsg_len member.


Revision tags: minoura-xpg4dl-base
# 1.46 30-Mar-2000 augustss

branches: 1.46.2;
Get rid of register declarations.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base chs-ubc2-base
# 1.45 17-Jun-1999 thorpej

branches: 1.45.2;
Um, hi, let's initialize pointers before we use them.


# 1.44 05-May-1999 thorpej

Add "use counting" to file entries. When closing a file, and it's reference
count is 0, wait for use count to drain before finishing the close.

This is necessary in order for multiple processes to safely share file
descriptor tables.


# 1.43 05-May-1999 thorpej

Fix alignment problem in the garbage-collection code path.


# 1.42 30-Apr-1999 thorpej

Break cdir/rdir/cmask info out of struct filedesc, and put it in a new
substructure, `cwdinfo'. Implement optional sharing of this substructure.

This is required for clone(2).


# 1.41 21-Apr-1999 mrg

revert previous. oops.


# 1.40 21-Apr-1999 mrg

properly test the msgsz as "msgsz - len". from PR#7386


Revision tags: netbsd-1-4-base
# 1.39 22-Mar-1999 sommerfe

branches: 1.39.2;
Disallow descriptor-passing of descriptors which are open on
directories which aren't under the recipient's root.

Clean up of many error conditions involving descriptor passing, to
eliminate infinite loops, panics, premature garbage collection of
sockets, and descriptor leaks:
- Avoid letting unp_gc() see descriptors with a refcount of zero by
removing them from the socket's queue before releasing them.
- Avoid socket leak in PRU_ABORT (this will also gc descriptors queued
on a not-yet accepted socket when the accepting socket goes away).
- Put in block comment explaining how unp_gc() should work.
- Correctly manage unp_defer count so we don't get stuck in an infinite
loop with nothing to do.
- Don't tie MARK and DEFER bits so closely together.
- Mark descriptors queued on not-yet-accepted sockets as well.
- Don't call sorflush on non-socket, it doesn't work very well.
- Deal with discard of NULL file pointer.
- Hopefully cause GC to converge faster by only deferring sockets in
unp_mark().


# 1.38 21-Dec-1998 thorpej

In unp_internalize(), add a comment explaining why we must ALIGN() the
data after the cmsghdr when accessing internalized SCM_RIGHTS messages
(i.e. array of struct file *s). The historic interface does not align
the externalized SCM_RIGHTS messages (i.e. array of ints).


# 1.37 21-Dec-1998 thorpej

Fix a fencepost error in unp_scan() which caused a bad pointer deref on
the SPARC platform only (ILP32 but ALIGNBYTES of 7), due to a missing
ALIGN().


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.36 04-Aug-1998 perry

Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)


# 1.35 31-Jul-1998 perry

fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.


Revision tags: eeh-paddr_t-base
# 1.34 18-Jul-1998 lukem

branches: 1.34.2;
use AF_LOCAL instead of AF_UNIX


# 1.33 16-Jul-1998 thorpej

Back out previous, I botched something.


# 1.32 10-Jul-1998 thorpej

For SOCK_STREAM, provide the socket credentials to the accepter as soon as
the client connects.


# 1.31 01-Mar-1998 fvdl

Merge with Lite2 + local changes


# 1.30 07-Jan-1998 thorpej

Implement passing credentials as ancillary data on Unix domain sockets,
enabled with the LOCAL_CREDS socket option on the listener. Semantics are
similar to BSD/OS's:
- Creds are available with first data on SOCK_STREAM, and with every datagram
on SOCK_DGRAM.
- It is not possible to forge credentials.

Different in that:
- Different credential data structure (ours does not rely on the format
of internal kernel data structures, and does not pass the login name).
- We can pass creds and file descriptors at the same time (this does not
work in BSD/OS).

Luke Mewburn <lukem@netbsd.org> gets credit for inspiring me to implement
this. :-)


# 1.29 07-Jan-1998 thorpej

Fix passing of multiple file descriptors (was broken when code was made
64-bit safe).


Revision tags: netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.28 17-Oct-1997 christos

branches: 1.28.2;
PR/4280: Chris Jones: Sending more than one fd over AF_UNIX sockets causes
panic. Bug in the fd -> struct file * conversion...


Revision tags: thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.27 26-Jun-1997 thorpej

Several small changes to eliminate kludginess in dealing with unix domain
socket names:
- In unp_setsockaddr() and unp_setpeeraddr(), if the socket name can't
fit into a single mbuf, allocate enough external storage space to
hold it.
- In unp_bind() and unp_connect(), perform a similar operation, but allocate
one extra byte, and ensure that the pathname is nul-terminated.
Many thanks to enami tsugutomo <enami@cv.sony.co.jp> for the sanity
checking.


# 1.26 24-Jun-1997 thorpej

Eliminate use of dtom() in the handing of UNIX domain sockets. Add an
"unp_addrlen" member to the unpcb, and use it when copying the socket
name. This eliminates that last uses of dtom() in the system.


# 1.25 15-May-1997 kleink

When fstat(2)ing a file descriptor of a local communications domain socket,
fill the socket's creation time into the stat structure's st_[acm]time fields:
POSIX requires this behavior for pipe(2). N.B.: updating the st_[am]time fields
when reading/writing the pipe is neither required nor implemented, though.


# 1.24 10-Apr-1997 cgd

Internalize and externalize file descriptors being passed via local domain
socket control messages correctly, without assuming that sizeof(int) ==
sizeof(pointer). Fixes PR#3183.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.23 23-May-1996 mycroft

Oops. Add missing label.


# 1.22 23-May-1996 mycroft

We can only get a control mbuf for PRU_SEND or PRU_SENDOOB. Add diagnostic
code to panic in this case.


# 1.21 23-May-1996 mycroft

Make sure the control and data mbufs are freed in all cases.


# 1.20 23-May-1996 mycroft

Separate some code into separate functions.
Make unp_addr be a pointer to the sockaddr, not to the mbuf, as with raw
sockets.
Other minor cleanup.


# 1.19 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.18 09-Feb-1996 christos

branches: 1.18.4;
More proto fixes


# 1.17 04-Feb-1996 pk

unp_detach() return type botch.


# 1.16 04-Feb-1996 christos

First pass at prototyping


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.15 17-Aug-1995 mycroft

so_pcb should be a void *.


# 1.14 16-Aug-1995 mycroft

Allocate PCBs with malloc(), more mgetclr(). Be more careful to free the
PCB after it's done with.


# 1.13 05-Apr-1995 mycroft

Add missing argument to closef().


# 1.12 13-Dec-1994 mycroft

LEASE_CHECK -> VOP_LEASE


# 1.11 30-Aug-1994 mycroft

Convert process, file, and namei lists and hash tables to use queue.h.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.10 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.9 08-Jun-1994 mycroft

Update to 4.4-Lite fs code.


# 1.8 04-May-1994 mycroft

Fix panic when closing a file descriptor on which access rights have been sent
but not received.


# 1.7 18-Dec-1993 mycroft

Canonicalize all #includes.


Revision tags: magnum-base
# 1.6 14-Sep-1993 mycroft

Fix from Noriyuki Soda <soda@sra.co.jp>:
recvmsg(2) always returns -1 with errno==EMSGSIZE, when trying
to pass file descriptors through UNIX domain socket.


Revision tags: netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.5 27-Jun-1993 andrew

branches: 1.5.4;
ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.


# 1.4 12-Jun-1993 andrew

Yuval Yarom's 'panic:closef: count < 0' fix to unp_discard().


# 1.3 22-May-1993 cgd

add include of select.h if necessary for protos, or delete if extraneous


# 1.2 18-May-1993 cgd

make kernel select interface be one-stop shopping & clean it all up.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision