Cross Reference: /freebsd-10.2-release/sys/kern/uipc

Deleted Added

sdiff udiff text old ( 191917 ) new ( 193272 )

full compact

uipc_socket.c (191917)	uipc_socket.c (193272)
1/- 2 Copyright (c) 1982, 1986, 1988, 1990, 1993 3 * The Regents of the University of California. 4 * Copyright (c) 2004 The FreeBSD Foundation 5 * Copyright (c) 2004-2008 Robert N. M. Watson 6 * All rights reserved. 7 * 8 * Redistribution and use in source and binary forms, with or without 9 * modification, are permitted provided that the following conditions 10 * are met: 11 * 1. Redistributions of source code must retain the above copyright 12 * notice, this list of conditions and the following disclaimer. 13 * 2. Redistributions in binary form must reproduce the above copyright 14 * notice, this list of conditions and the following disclaimer in the 15 * documentation and/or other materials provided with the distribution. 16 * 4. Neither the name of the University nor the names of its contributors 17 * may be used to endorse or promote products derived from this software 18 * without specific prior written permission. 19 * 20 * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 21 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 22 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 23 * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 24 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 25 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 26 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 27 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 28 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 29 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 30 * SUCH DAMAGE. 31 * 32 * @(#)uipc_socket.c 8.3 (Berkeley) 4/15/94 33 / 34 35/ 36 * Comments on the socket life cycle: 37 * 38 * soalloc() sets of socket layer state for a socket, called only by 39 * socreate() and sonewconn(). Socket layer private. 40 * 41 * sodealloc() tears down socket layer state for a socket, called only by 42 * sofree() and sonewconn(). Socket layer private. 43 * 44 * pru_attach() associates protocol layer state with an allocated socket; 45 * called only once, may fail, aborting socket allocation. This is called 46 * from socreate() and sonewconn(). Socket layer private. 47 * 48 * pru_detach() disassociates protocol layer state from an attached socket, 49 * and will be called exactly once for sockets in which pru_attach() has 50 * been successfully called. If pru_attach() returned an error, 51 * pru_detach() will not be called. Socket layer private. 52 * 53 * pru_abort() and pru_close() notify the protocol layer that the last 54 * consumer of a socket is starting to tear down the socket, and that the 55 * protocol should terminate the connection. Historically, pru_abort() also 56 * detached protocol state from the socket state, but this is no longer the 57 * case. 58 * 59 * socreate() creates a socket and attaches protocol state. This is a public 60 * interface that may be used by socket layer consumers to create new 61 * sockets. 62 * 63 * sonewconn() creates a socket and attaches protocol state. This is a 64 * public interface that may be used by protocols to create new sockets when 65 * a new connection is received and will be available for accept() on a 66 * listen socket. 67 * 68 * soclose() destroys a socket after possibly waiting for it to disconnect. 69 * This is a public interface that socket consumers should use to close and 70 * release a socket when done with it. 71 * 72 * soabort() destroys a socket without waiting for it to disconnect (used 73 * only for incoming connections that are already partially or fully 74 * connected). This is used internally by the socket layer when clearing 75 * listen socket queues (due to overflow or close on the listen socket), but 76 * is also a public interface protocols may use to abort connections in 77 * their incomplete listen queues should they no longer be required. Sockets 78 * placed in completed connection listen queues should not be aborted for 79 * reasons described in the comment above the soclose() implementation. This 80 * is not a general purpose close routine, and except in the specific 81 * circumstances described here, should not be used. 82 * 83 * sofree() will free a socket and its protocol state if all references on 84 * the socket have been released, and is the public interface to attempt to 85 * free a socket when a reference is removed. This is a socket layer private 86 * interface. 87 * 88 * NOTE: In addition to socreate() and soclose(), which provide a single 89 * socket reference to the consumer to be managed as required, there are two 90 * calls to explicitly manage socket references, soref(), and sorele(). 91 * Currently, these are generally required only when transitioning a socket 92 * from a listen queue to a file descriptor, in order to prevent garbage 93 * collection of the socket at an untimely moment. For a number of reasons, 94 * these interfaces are not preferred, and should be avoided. 95 */ 96 97#include <sys/cdefs.h>	1/- 2 Copyright (c) 1982, 1986, 1988, 1990, 1993 3 * The Regents of the University of California. 4 * Copyright (c) 2004 The FreeBSD Foundation 5 * Copyright (c) 2004-2008 Robert N. M. Watson 6 * All rights reserved. 7 * 8 * Redistribution and use in source and binary forms, with or without 9 * modification, are permitted provided that the following conditions 10 * are met: 11 * 1. Redistributions of source code must retain the above copyright 12 * notice, this list of conditions and the following disclaimer. 13 * 2. Redistributions in binary form must reproduce the above copyright 14 * notice, this list of conditions and the following disclaimer in the 15 * documentation and/or other materials provided with the distribution. 16 * 4. Neither the name of the University nor the names of its contributors 17 * may be used to endorse or promote products derived from this software 18 * without specific prior written permission. 19 * 20 * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 21 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 22 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 23 * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 24 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 25 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 26 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 27 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 28 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 29 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 30 * SUCH DAMAGE. 31 * 32 * @(#)uipc_socket.c 8.3 (Berkeley) 4/15/94 33 / 34 35/ 36 * Comments on the socket life cycle: 37 * 38 * soalloc() sets of socket layer state for a socket, called only by 39 * socreate() and sonewconn(). Socket layer private. 40 * 41 * sodealloc() tears down socket layer state for a socket, called only by 42 * sofree() and sonewconn(). Socket layer private. 43 * 44 * pru_attach() associates protocol layer state with an allocated socket; 45 * called only once, may fail, aborting socket allocation. This is called 46 * from socreate() and sonewconn(). Socket layer private. 47 * 48 * pru_detach() disassociates protocol layer state from an attached socket, 49 * and will be called exactly once for sockets in which pru_attach() has 50 * been successfully called. If pru_attach() returned an error, 51 * pru_detach() will not be called. Socket layer private. 52 * 53 * pru_abort() and pru_close() notify the protocol layer that the last 54 * consumer of a socket is starting to tear down the socket, and that the 55 * protocol should terminate the connection. Historically, pru_abort() also 56 * detached protocol state from the socket state, but this is no longer the 57 * case. 58 * 59 * socreate() creates a socket and attaches protocol state. This is a public 60 * interface that may be used by socket layer consumers to create new 61 * sockets. 62 * 63 * sonewconn() creates a socket and attaches protocol state. This is a 64 * public interface that may be used by protocols to create new sockets when 65 * a new connection is received and will be available for accept() on a 66 * listen socket. 67 * 68 * soclose() destroys a socket after possibly waiting for it to disconnect. 69 * This is a public interface that socket consumers should use to close and 70 * release a socket when done with it. 71 * 72 * soabort() destroys a socket without waiting for it to disconnect (used 73 * only for incoming connections that are already partially or fully 74 * connected). This is used internally by the socket layer when clearing 75 * listen socket queues (due to overflow or close on the listen socket), but 76 * is also a public interface protocols may use to abort connections in 77 * their incomplete listen queues should they no longer be required. Sockets 78 * placed in completed connection listen queues should not be aborted for 79 * reasons described in the comment above the soclose() implementation. This 80 * is not a general purpose close routine, and except in the specific 81 * circumstances described here, should not be used. 82 * 83 * sofree() will free a socket and its protocol state if all references on 84 * the socket have been released, and is the public interface to attempt to 85 * free a socket when a reference is removed. This is a socket layer private 86 * interface. 87 * 88 * NOTE: In addition to socreate() and soclose(), which provide a single 89 * socket reference to the consumer to be managed as required, there are two 90 * calls to explicitly manage socket references, soref(), and sorele(). 91 * Currently, these are generally required only when transitioning a socket 92 * from a listen queue to a file descriptor, in order to prevent garbage 93 * collection of the socket at an untimely moment. For a number of reasons, 94 * these interfaces are not preferred, and should be avoided. 95 */ 96 97#include <sys/cdefs.h>
98__FBSDID("$FreeBSD: head/sys/kern/uipc_socket.c 191917 2009-05-08 14:34:25Z zec $");	98__FBSDID("$FreeBSD: head/sys/kern/uipc_socket.c 193272 2009-06-01 21:17:03Z jhb $");
99 100#include "opt_inet.h" 101#include "opt_inet6.h" 102#include "opt_mac.h" 103#include "opt_zero.h" 104#include "opt_compat.h" 105 106#include <sys/param.h> 107#include <sys/systm.h> 108#include <sys/fcntl.h> 109#include <sys/limits.h> 110#include <sys/lock.h> 111#include <sys/mac.h> 112#include <sys/malloc.h> 113#include <sys/mbuf.h> 114#include <sys/mutex.h> 115#include <sys/domain.h> 116#include <sys/file.h> /* for struct knote / 117#include <sys/kernel.h> 118#include <sys/event.h> 119#include <sys/eventhandler.h> 120#include <sys/poll.h> 121#include <sys/proc.h> 122#include <sys/protosw.h> 123#include <sys/socket.h> 124#include <sys/socketvar.h> 125#include <sys/resourcevar.h> 126#include <net/route.h> 127#include <sys/signalvar.h> 128#include <sys/stat.h> 129#include <sys/sx.h> 130#include <sys/sysctl.h> 131#include <sys/uio.h> 132#include <sys/jail.h> 133#include <sys/vimage.h> 134* 135#include <security/mac/mac_framework.h> 136 137#include <vm/uma.h> 138 139#ifdef COMPAT_IA32 140#include <sys/mount.h> 141#include <sys/sysent.h> 142#include <compat/freebsd32/freebsd32.h> 143#endif 144 145static int soreceive_rcvoob(struct socket so, struct uio uio, 146 int flags); 147 148static void filt_sordetach(struct knote kn); 149static int filt_soread(struct knote kn, long hint); 150static void filt_sowdetach(struct knote kn); 151static int filt_sowrite(struct knote kn, long hint); 152static int filt_solisten(struct knote kn, long hint); 153* 154static struct filterops solisten_filtops = 155 { 1, NULL, filt_sordetach, filt_solisten }; 156static struct filterops soread_filtops = 157 { 1, NULL, filt_sordetach, filt_soread }; 158static struct filterops sowrite_filtops = 159 { 1, NULL, filt_sowdetach, filt_sowrite }; 160 161uma_zone_t socket_zone; 162so_gen_t so_gencnt; /* generation count for sockets / 163* 164int maxsockets; 165 166MALLOC_DEFINE(M_SONAME, "soname", "socket name"); 167MALLOC_DEFINE(M_PCB, "pcb", "protocol control block"); 168 169static int somaxconn = SOMAXCONN; 170static int sysctl_somaxconn(SYSCTL_HANDLER_ARGS); 171/* XXX: we dont have SYSCTL_USHORT / 172SYSCTL_PROC(_kern_ipc, KIPC_SOMAXCONN, somaxconn, CTLTYPE_UINT \| CTLFLAG_RW, 173* 0, sizeof(int), sysctl_somaxconn, "I", "Maximum pending socket connection " 174 "queue size"); 175static int numopensockets; 176SYSCTL_INT(_kern_ipc, OID_AUTO, numopensockets, CTLFLAG_RD, 177 &numopensockets, 0, "Number of open sockets"); 178#ifdef ZERO_COPY_SOCKETS 179/* These aren't static because they're used in other files. / 180int so_zero_copy_send = 1; 181int so_zero_copy_receive = 1; 182SYSCTL_NODE(_kern_ipc, OID_AUTO, zero_copy, CTLFLAG_RD, 0, 183* "Zero copy controls"); 184SYSCTL_INT(_kern_ipc_zero_copy, OID_AUTO, receive, CTLFLAG_RW, 185 &so_zero_copy_receive, 0, "Enable zero copy receive"); 186SYSCTL_INT(_kern_ipc_zero_copy, OID_AUTO, send, CTLFLAG_RW, 187 &so_zero_copy_send, 0, "Enable zero copy send"); 188#endif /* ZERO_COPY_SOCKETS / 189* 190/* 191 * accept_mtx locks down per-socket fields relating to accept queues. See 192 * socketvar.h for an annotation of the protected fields of struct socket. 193 / 194struct mtx accept_mtx; 195MTX_SYSINIT(accept_mtx, &accept_mtx, "accept", MTX_DEF); 196* 197/* 198 * so_global_mtx protects so_gencnt, numopensockets, and the per-socket 199 * so_gencnt field. 200 / 201static struct mtx so_global_mtx; 202MTX_SYSINIT(so_global_mtx, &so_global_mtx, "so_glabel", MTX_DEF); 203* 204/* 205 * General IPC sysctl name space, used by sockets and a variety of other IPC 206 * types. 207 / 208SYSCTL_NODE(_kern, KERN_IPC, ipc, CTLFLAG_RW, 0, "IPC"); 209* 210/* 211 * Sysctl to get and set the maximum global sockets limit. Notify protocols 212 * of the change so that they can update their dependent limits as required. 213 / 214static int 215sysctl_maxsockets(SYSCTL_HANDLER_ARGS) 216{ 217* int error, newmaxsockets; 218 219 newmaxsockets = maxsockets; 220 error = sysctl_handle_int(oidp, &newmaxsockets, 0, req); 221 if (error == 0 && req->newptr) { 222 if (newmaxsockets > maxsockets) { 223 maxsockets = newmaxsockets; 224 if (maxsockets > ((maxfiles / 4) * 3)) { 225 maxfiles = (maxsockets * 5) / 4; 226 maxfilesperproc = (maxfiles * 9) / 10; 227 } 228 EVENTHANDLER_INVOKE(maxsockets_change); 229 } else 230 error = EINVAL; 231 } 232 return (error); 233} 234 235SYSCTL_PROC(_kern_ipc, OID_AUTO, maxsockets, CTLTYPE_INT\|CTLFLAG_RW, 236 &maxsockets, 0, sysctl_maxsockets, "IU", 237 "Maximum number of sockets avaliable"); 238 239/* 240 * Initialise maxsockets. This SYSINIT must be run after 241 * tunable_mbinit(). 242 / 243static void 244init_maxsockets(void ignored) 245{ 246 247 TUNABLE_INT_FETCH("kern.ipc.maxsockets", &maxsockets); 248 maxsockets = imax(maxsockets, imax(maxfiles, nmbclusters)); 249} 250SYSINIT(param, SI_SUB_TUNABLES, SI_ORDER_ANY, init_maxsockets, NULL); 251 252/* 253 * Socket operation routines. These routines are called by the routines in 254 * sys_socket.c or from a system process, and implement the semantics of 255 * socket operations by switching out to the protocol specific routines. 256 / 257* 258/* 259 * Get a socket structure from our zone, and initialize it. Note that it 260 * would probably be better to allocate socket and PCB at the same time, but 261 * I'm not convinced that all the protocols can be easily modified to do 262 * this. 263 * 264 * soalloc() returns a socket with a ref count of 0. 265 / 266static struct socket 267soalloc(struct vnet vnet) 268{ 269* struct socket so; 270* 271 so = uma_zalloc(socket_zone, M_NOWAIT \| M_ZERO); 272 if (so == NULL) 273 return (NULL); 274#ifdef MAC 275 if (mac_socket_init(so, M_NOWAIT) != 0) { 276 uma_zfree(socket_zone, so); 277 return (NULL); 278 } 279#endif 280 SOCKBUF_LOCK_INIT(&so->so_snd, "so_snd"); 281 SOCKBUF_LOCK_INIT(&so->so_rcv, "so_rcv"); 282 sx_init(&so->so_snd.sb_sx, "so_snd_sx"); 283 sx_init(&so->so_rcv.sb_sx, "so_rcv_sx"); 284 TAILQ_INIT(&so->so_aiojobq); 285 mtx_lock(&so_global_mtx); 286 so->so_gencnt = ++so_gencnt; 287 ++numopensockets; 288#ifdef VIMAGE 289 ++vnet->sockcnt; /* Locked with so_global_mtx. / 290* so->so_vnet = vnet; 291#endif 292 mtx_unlock(&so_global_mtx); 293 return (so); 294} 295 296/* 297 * Free the storage associated with a socket at the socket layer, tear down 298 * locks, labels, etc. All protocol state is assumed already to have been 299 * torn down (and possibly never set up) by the caller. 300 / 301static void 302sodealloc(struct socket so) 303{ 304 305 KASSERT(so->so_count == 0, ("sodealloc(): so_count %d", so->so_count)); 306 KASSERT(so->so_pcb == NULL, ("sodealloc(): so_pcb != NULL")); 307 308 mtx_lock(&so_global_mtx); 309 so->so_gencnt = ++so_gencnt; 310 --numopensockets; /* Could be below, but faster here. / 311#ifdef VIMAGE 312* --so->so_vnet->sockcnt; 313#endif 314 mtx_unlock(&so_global_mtx); 315 if (so->so_rcv.sb_hiwat) 316 (void)chgsbsize(so->so_cred->cr_uidinfo, 317 &so->so_rcv.sb_hiwat, 0, RLIM_INFINITY); 318 if (so->so_snd.sb_hiwat) 319 (void)chgsbsize(so->so_cred->cr_uidinfo, 320 &so->so_snd.sb_hiwat, 0, RLIM_INFINITY); 321#ifdef INET 322 /* remove acccept filter if one is present. / 323* if (so->so_accf != NULL) 324 do_setopt_accept_filter(so, NULL); 325#endif 326#ifdef MAC 327 mac_socket_destroy(so); 328#endif 329 crfree(so->so_cred); 330 sx_destroy(&so->so_snd.sb_sx); 331 sx_destroy(&so->so_rcv.sb_sx); 332 SOCKBUF_LOCK_DESTROY(&so->so_snd); 333 SOCKBUF_LOCK_DESTROY(&so->so_rcv); 334 uma_zfree(socket_zone, so); 335} 336 337/* 338 * socreate returns a socket with a ref count of 1. The socket should be 339 * closed with soclose(). 340 / 341int 342socreate(int dom, struct socket aso, int type, int proto, 343* struct ucred cred, struct thread td) 344{ 345 struct protosw prp; 346* struct socket so; 347* int error; 348 349 if (proto) 350 prp = pffindproto(dom, proto, type); 351 else 352 prp = pffindtype(dom, type); 353 354 if (prp == NULL \|\| prp->pr_usrreqs->pru_attach == NULL \|\| 355 prp->pr_usrreqs->pru_attach == pru_attach_notsupp) 356 return (EPROTONOSUPPORT); 357 358 if (prison_check_af(cred, prp->pr_domain->dom_family) != 0) 359 return (EPROTONOSUPPORT); 360 361 if (prp->pr_type != type) 362 return (EPROTOTYPE); 363 so = soalloc(TD_TO_VNET(td)); 364 if (so == NULL) 365 return (ENOBUFS); 366 367 TAILQ_INIT(&so->so_incomp); 368 TAILQ_INIT(&so->so_comp); 369 so->so_type = type; 370 so->so_cred = crhold(cred); 371 if ((prp->pr_domain->dom_family == PF_INET) \|\| 372 (prp->pr_domain->dom_family == PF_ROUTE)) 373 so->so_fibnum = td->td_proc->p_fibnum; 374 else 375 so->so_fibnum = 0; 376 so->so_proto = prp; 377#ifdef MAC 378 mac_socket_create(cred, so); 379#endif 380 knlist_init(&so->so_rcv.sb_sel.si_note, SOCKBUF_MTX(&so->so_rcv), 381 NULL, NULL, NULL); 382 knlist_init(&so->so_snd.sb_sel.si_note, SOCKBUF_MTX(&so->so_snd), 383 NULL, NULL, NULL); 384 so->so_count = 1; 385 /* 386 * Auto-sizing of socket buffers is managed by the protocols and 387 * the appropriate flags must be set in the pru_attach function. 388 / 389* CURVNET_SET(so->so_vnet); 390 error = (prp->pr_usrreqs->pru_attach)(so, proto, td); 391* CURVNET_RESTORE(); 392 if (error) { 393 KASSERT(so->so_count == 1, ("socreate: so_count %d", 394 so->so_count)); 395 so->so_count = 0; 396 sodealloc(so); 397 return (error); 398 } 399 aso = so; 400* return (0); 401} 402 403#ifdef REGRESSION 404static int regression_sonewconn_earlytest = 1; 405SYSCTL_INT(_regression, OID_AUTO, sonewconn_earlytest, CTLFLAG_RW, 406 &regression_sonewconn_earlytest, 0, "Perform early sonewconn limit test"); 407#endif 408 409/* 410 * When an attempt at a new connection is noted on a socket which accepts 411 * connections, sonewconn is called. If the connection is possible (subject 412 * to space constraints, etc.) then we allocate a new structure, propoerly 413 * linked into the data structure of the original socket, and return this. 414 * Connstatus may be 0, or SO_ISCONFIRMING, or SO_ISCONNECTED. 415 * 416 * Note: the ref count on the socket is 0 on return. 417 / 418struct socket 419sonewconn(struct socket head, int connstatus) 420{ 421* struct socket so; 422* int over; 423 424 ACCEPT_LOCK(); 425 over = (head->so_qlen > 3 * head->so_qlimit / 2); 426 ACCEPT_UNLOCK(); 427#ifdef REGRESSION 428 if (regression_sonewconn_earlytest && over) 429#else 430 if (over) 431#endif 432 return (NULL); 433 VNET_ASSERT(head->so_vnet); 434 so = soalloc(head->so_vnet); 435 if (so == NULL) 436 return (NULL); 437 if ((head->so_options & SO_ACCEPTFILTER) != 0) 438 connstatus = 0; 439 so->so_head = head; 440 so->so_type = head->so_type; 441 so->so_options = head->so_options &~ SO_ACCEPTCONN; 442 so->so_linger = head->so_linger; 443 so->so_state = head->so_state \| SS_NOFDREF; 444 so->so_proto = head->so_proto; 445 so->so_cred = crhold(head->so_cred); 446#ifdef MAC 447 SOCK_LOCK(head); 448 mac_socket_newconn(head, so); 449 SOCK_UNLOCK(head); 450#endif 451 knlist_init(&so->so_rcv.sb_sel.si_note, SOCKBUF_MTX(&so->so_rcv), 452 NULL, NULL, NULL); 453 knlist_init(&so->so_snd.sb_sel.si_note, SOCKBUF_MTX(&so->so_snd), 454 NULL, NULL, NULL); 455 if (soreserve(so, head->so_snd.sb_hiwat, head->so_rcv.sb_hiwat) \|\| 456 (so->so_proto->pr_usrreqs->pru_attach)(so, 0, NULL)) { 457* sodealloc(so); 458 return (NULL); 459 } 460 so->so_rcv.sb_lowat = head->so_rcv.sb_lowat; 461 so->so_snd.sb_lowat = head->so_snd.sb_lowat; 462 so->so_rcv.sb_timeo = head->so_rcv.sb_timeo; 463 so->so_snd.sb_timeo = head->so_snd.sb_timeo; 464 so->so_rcv.sb_flags \|= head->so_rcv.sb_flags & SB_AUTOSIZE; 465 so->so_snd.sb_flags \|= head->so_snd.sb_flags & SB_AUTOSIZE; 466 so->so_state \|= connstatus; 467 ACCEPT_LOCK(); 468 if (connstatus) { 469 TAILQ_INSERT_TAIL(&head->so_comp, so, so_list); 470 so->so_qstate \|= SQ_COMP; 471 head->so_qlen++; 472 } else { 473 /* 474 * Keep removing sockets from the head until there's room for 475 * us to insert on the tail. In pre-locking revisions, this 476 * was a simple if(), but as we could be racing with other 477 * threads and soabort() requires dropping locks, we must 478 * loop waiting for the condition to be true. 479 / 480* while (head->so_incqlen > head->so_qlimit) { 481 struct socket sp; 482* sp = TAILQ_FIRST(&head->so_incomp); 483 TAILQ_REMOVE(&head->so_incomp, sp, so_list); 484 head->so_incqlen--; 485 sp->so_qstate &= ~SQ_INCOMP; 486 sp->so_head = NULL; 487 ACCEPT_UNLOCK(); 488 soabort(sp); 489 ACCEPT_LOCK(); 490 } 491 TAILQ_INSERT_TAIL(&head->so_incomp, so, so_list); 492 so->so_qstate \|= SQ_INCOMP; 493 head->so_incqlen++; 494 } 495 ACCEPT_UNLOCK(); 496 if (connstatus) { 497 sorwakeup(head); 498 wakeup_one(&head->so_timeo); 499 } 500 return (so); 501} 502 503int 504sobind(struct socket so, struct sockaddr nam, struct thread td) 505{ 506* int error; 507 508 CURVNET_SET(so->so_vnet); 509 error = (so->so_proto->pr_usrreqs->pru_bind)(so, nam, td); 510* CURVNET_RESTORE(); 511 return error; 512} 513 514/* 515 * solisten() transitions a socket from a non-listening state to a listening 516 * state, but can also be used to update the listen queue depth on an 517 * existing listen socket. The protocol will call back into the sockets 518 * layer using solisten_proto_check() and solisten_proto() to check and set 519 * socket-layer listen state. Call backs are used so that the protocol can 520 * acquire both protocol and socket layer locks in whatever order is required 521 * by the protocol. 522 * 523 * Protocol implementors are advised to hold the socket lock across the 524 * socket-layer test and set to avoid races at the socket layer. 525 / 526int 527solisten(struct socket so, int backlog, struct thread td) 528{ 529* 530 return ((so->so_proto->pr_usrreqs->pru_listen)(so, backlog, td)); 531} 532* 533int 534solisten_proto_check(struct socket so) 535{ 536* 537 SOCK_LOCK_ASSERT(so); 538 539 if (so->so_state & (SS_ISCONNECTED \| SS_ISCONNECTING \| 540 SS_ISDISCONNECTING)) 541 return (EINVAL); 542 return (0); 543} 544 545void 546solisten_proto(struct socket so, int backlog) 547{ 548* 549 SOCK_LOCK_ASSERT(so); 550 551 if (backlog < 0 \|\| backlog > somaxconn) 552 backlog = somaxconn; 553 so->so_qlimit = backlog; 554 so->so_options \|= SO_ACCEPTCONN; 555} 556 557/* 558 * Attempt to free a socket. This should really be sotryfree(). 559 * 560 * sofree() will succeed if: 561 * 562 * - There are no outstanding file descriptor references or related consumers 563 * (so_count == 0). 564 * 565 * - The socket has been closed by user space, if ever open (SS_NOFDREF). 566 * 567 * - The protocol does not have an outstanding strong reference on the socket 568 * (SS_PROTOREF). 569 * 570 * - The socket is not in a completed connection queue, so a process has been 571 * notified that it is present. If it is removed, the user process may 572 * block in accept() despite select() saying the socket was ready. 573 * 574 * Otherwise, it will quietly abort so that a future call to sofree(), when 575 * conditions are right, can succeed. 576 / 577void 578sofree(struct socket so) 579{ 580 struct protosw pr = so->so_proto; 581* struct socket head; 582* 583 ACCEPT_LOCK_ASSERT(); 584 SOCK_LOCK_ASSERT(so); 585 586 if ((so->so_state & SS_NOFDREF) == 0 \|\| so->so_count != 0 \|\| 587 (so->so_state & SS_PROTOREF) \|\| (so->so_qstate & SQ_COMP)) { 588 SOCK_UNLOCK(so); 589 ACCEPT_UNLOCK(); 590 return; 591 } 592 593 head = so->so_head; 594 if (head != NULL) { 595 KASSERT((so->so_qstate & SQ_COMP) != 0 \|\| 596 (so->so_qstate & SQ_INCOMP) != 0, 597 ("sofree: so_head != NULL, but neither SQ_COMP nor " 598 "SQ_INCOMP")); 599 KASSERT((so->so_qstate & SQ_COMP) == 0 \|\| 600 (so->so_qstate & SQ_INCOMP) == 0, 601 ("sofree: so->so_qstate is SQ_COMP and also SQ_INCOMP")); 602 TAILQ_REMOVE(&head->so_incomp, so, so_list); 603 head->so_incqlen--; 604 so->so_qstate &= ~SQ_INCOMP; 605 so->so_head = NULL; 606 } 607 KASSERT((so->so_qstate & SQ_COMP) == 0 && 608 (so->so_qstate & SQ_INCOMP) == 0, 609 ("sofree: so_head == NULL, but still SQ_COMP(%d) or SQ_INCOMP(%d)", 610 so->so_qstate & SQ_COMP, so->so_qstate & SQ_INCOMP)); 611 if (so->so_options & SO_ACCEPTCONN) { 612 KASSERT((TAILQ_EMPTY(&so->so_comp)), ("sofree: so_comp populated")); 613 KASSERT((TAILQ_EMPTY(&so->so_incomp)), ("sofree: so_comp populated")); 614 } 615 SOCK_UNLOCK(so); 616 ACCEPT_UNLOCK(); 617 618 if (pr->pr_flags & PR_RIGHTS && pr->pr_domain->dom_dispose != NULL) 619 (pr->pr_domain->dom_dispose)(so->so_rcv.sb_mb); 620* if (pr->pr_usrreqs->pru_detach != NULL) 621 (pr->pr_usrreqs->pru_detach)(so); 622* 623 /* 624 * From this point on, we assume that no other references to this 625 * socket exist anywhere else in the stack. Therefore, no locks need 626 * to be acquired or held. 627 * 628 * We used to do a lot of socket buffer and socket locking here, as 629 * well as invoke sorflush() and perform wakeups. The direct call to 630 * dom_dispose() and sbrelease_internal() are an inlining of what was 631 * necessary from sorflush(). 632 * 633 * Notice that the socket buffer and kqueue state are torn down 634 * before calling pru_detach. This means that protocols shold not 635 * assume they can perform socket wakeups, etc, in their detach code. 636 / 637* sbdestroy(&so->so_snd, so); 638 sbdestroy(&so->so_rcv, so); 639 knlist_destroy(&so->so_rcv.sb_sel.si_note); 640 knlist_destroy(&so->so_snd.sb_sel.si_note); 641 sodealloc(so); 642} 643 644/* 645 * Close a socket on last file table reference removal. Initiate disconnect 646 * if connected. Free socket when disconnect complete. 647 * 648 * This function will sorele() the socket. Note that soclose() may be called 649 * prior to the ref count reaching zero. The actual socket structure will 650 * not be freed until the ref count reaches zero. 651 / 652int 653soclose(struct socket so) 654{ 655 int error = 0; 656 657 KASSERT(!(so->so_state & SS_NOFDREF), ("soclose: SS_NOFDREF on enter")); 658 659 CURVNET_SET(so->so_vnet); 660 funsetown(&so->so_sigio); 661 if (so->so_state & SS_ISCONNECTED) { 662 if ((so->so_state & SS_ISDISCONNECTING) == 0) { 663 error = sodisconnect(so); 664 if (error) 665 goto drop; 666 } 667 if (so->so_options & SO_LINGER) { 668 if ((so->so_state & SS_ISDISCONNECTING) && 669 (so->so_state & SS_NBIO)) 670 goto drop; 671 while (so->so_state & SS_ISCONNECTED) { 672 error = tsleep(&so->so_timeo, 673 PSOCK \| PCATCH, "soclos", so->so_linger * hz); 674 if (error) 675 break; 676 } 677 } 678 } 679 680drop: 681 if (so->so_proto->pr_usrreqs->pru_close != NULL) 682 (so->so_proto->pr_usrreqs->pru_close)(so); 683* if (so->so_options & SO_ACCEPTCONN) { 684 struct socket sp; 685* ACCEPT_LOCK(); 686 while ((sp = TAILQ_FIRST(&so->so_incomp)) != NULL) { 687 TAILQ_REMOVE(&so->so_incomp, sp, so_list); 688 so->so_incqlen--; 689 sp->so_qstate &= ~SQ_INCOMP; 690 sp->so_head = NULL; 691 ACCEPT_UNLOCK(); 692 soabort(sp); 693 ACCEPT_LOCK(); 694 } 695 while ((sp = TAILQ_FIRST(&so->so_comp)) != NULL) { 696 TAILQ_REMOVE(&so->so_comp, sp, so_list); 697 so->so_qlen--; 698 sp->so_qstate &= ~SQ_COMP; 699 sp->so_head = NULL; 700 ACCEPT_UNLOCK(); 701 soabort(sp); 702 ACCEPT_LOCK(); 703 } 704 ACCEPT_UNLOCK(); 705 } 706 ACCEPT_LOCK(); 707 SOCK_LOCK(so); 708 KASSERT((so->so_state & SS_NOFDREF) == 0, ("soclose: NOFDREF")); 709 so->so_state \|= SS_NOFDREF; 710 sorele(so); 711 CURVNET_RESTORE(); 712 return (error); 713} 714 715/* 716 * soabort() is used to abruptly tear down a connection, such as when a 717 * resource limit is reached (listen queue depth exceeded), or if a listen 718 * socket is closed while there are sockets waiting to be accepted. 719 * 720 * This interface is tricky, because it is called on an unreferenced socket, 721 * and must be called only by a thread that has actually removed the socket 722 * from the listen queue it was on, or races with other threads are risked. 723 * 724 * This interface will call into the protocol code, so must not be called 725 * with any socket locks held. Protocols do call it while holding their own 726 * recursible protocol mutexes, but this is something that should be subject 727 * to review in the future. 728 / 729void 730soabort(struct socket so) 731{ 732 733 /* 734 * In as much as is possible, assert that no references to this 735 * socket are held. This is not quite the same as asserting that the 736 * current thread is responsible for arranging for no references, but 737 * is as close as we can get for now. 738 / 739* KASSERT(so->so_count == 0, ("soabort: so_count")); 740 KASSERT((so->so_state & SS_PROTOREF) == 0, ("soabort: SS_PROTOREF")); 741 KASSERT(so->so_state & SS_NOFDREF, ("soabort: !SS_NOFDREF")); 742 KASSERT((so->so_state & SQ_COMP) == 0, ("soabort: SQ_COMP")); 743 KASSERT((so->so_state & SQ_INCOMP) == 0, ("soabort: SQ_INCOMP")); 744 745 if (so->so_proto->pr_usrreqs->pru_abort != NULL) 746 (so->so_proto->pr_usrreqs->pru_abort)(so); 747* ACCEPT_LOCK(); 748 SOCK_LOCK(so); 749 sofree(so); 750} 751 752int 753soaccept(struct socket so, struct sockaddr nam) 754{ 755* int error; 756 757 SOCK_LOCK(so); 758 KASSERT((so->so_state & SS_NOFDREF) != 0, ("soaccept: !NOFDREF")); 759 so->so_state &= ~SS_NOFDREF; 760 SOCK_UNLOCK(so); 761 error = (so->so_proto->pr_usrreqs->pru_accept)(so, nam); 762* return (error); 763} 764 765int 766soconnect(struct socket so, struct sockaddr nam, struct thread td) 767{ 768* int error; 769 770 if (so->so_options & SO_ACCEPTCONN) 771 return (EOPNOTSUPP); 772 /* 773 * If protocol is connection-based, can only connect once. 774 * Otherwise, if connected, try to disconnect first. This allows 775 * user to disconnect by connecting to, e.g., a null address. 776 / 777* if (so->so_state & (SS_ISCONNECTED\|SS_ISCONNECTING) && 778 ((so->so_proto->pr_flags & PR_CONNREQUIRED) \|\| 779 (error = sodisconnect(so)))) { 780 error = EISCONN; 781 } else { 782 /* 783 * Prevent accumulated error from previous connection from 784 * biting us. 785 / 786* so->so_error = 0; 787 CURVNET_SET(so->so_vnet); 788 error = (so->so_proto->pr_usrreqs->pru_connect)(so, nam, td); 789* CURVNET_RESTORE(); 790 } 791 792 return (error); 793} 794 795int 796soconnect2(struct socket so1, struct socket so2) 797{ 798 799 return ((so1->so_proto->pr_usrreqs->pru_connect2)(so1, so2)); 800} 801* 802int 803sodisconnect(struct socket so) 804{ 805* int error; 806 807 if ((so->so_state & SS_ISCONNECTED) == 0) 808 return (ENOTCONN); 809 if (so->so_state & SS_ISDISCONNECTING) 810 return (EALREADY); 811 error = (so->so_proto->pr_usrreqs->pru_disconnect)(so); 812* return (error); 813} 814 815#ifdef ZERO_COPY_SOCKETS 816struct so_zerocopy_stats{ 817 int size_ok; 818 int align_ok; 819 int found_ifp; 820}; 821struct so_zerocopy_stats so_zerocp_stats = {0,0,0}; 822#include <netinet/in.h> 823#include <net/route.h> 824#include <netinet/in_pcb.h> 825#include <vm/vm.h> 826#include <vm/vm_page.h> 827#include <vm/vm_object.h> 828 829/* 830 * sosend_copyin() is only used if zero copy sockets are enabled. Otherwise 831 * sosend_dgram() and sosend_generic() use m_uiotombuf(). 832 * 833 * sosend_copyin() accepts a uio and prepares an mbuf chain holding part or 834 * all of the data referenced by the uio. If desired, it uses zero-copy. 835 * space will be updated to reflect data copied in. 836* * 837 * NB: If atomic I/O is requested, the caller must already have checked that 838 * space can hold resid bytes. 839 * 840 * NB: In the event of an error, the caller may need to free the partial 841 * chain pointed to by mpp. The contents of both uio and space may be 842* * modified even in the case of an error. 843 / 844static int 845sosend_copyin(struct uio uio, struct mbuf *retmp, int atomic, long space, 846 int flags) 847{ 848 struct mbuf m, mp, top; 849 long len, resid; 850 int error; 851#ifdef ZERO_COPY_SOCKETS 852 int cow_send; 853#endif 854 855 retmp = top = NULL; 856* mp = &top; 857 len = 0; 858 resid = uio->uio_resid; 859 error = 0; 860 do { 861#ifdef ZERO_COPY_SOCKETS 862 cow_send = 0; 863#endif /* ZERO_COPY_SOCKETS / 864* if (resid >= MINCLSIZE) { 865#ifdef ZERO_COPY_SOCKETS 866 if (top == NULL) { 867 m = m_gethdr(M_WAITOK, MT_DATA); 868 m->m_pkthdr.len = 0; 869 m->m_pkthdr.rcvif = NULL; 870 } else 871 m = m_get(M_WAITOK, MT_DATA); 872 if (so_zero_copy_send && 873 resid>=PAGE_SIZE && 874 space>=PAGE_SIZE && 875* uio->uio_iov->iov_len>=PAGE_SIZE) { 876 so_zerocp_stats.size_ok++; 877 so_zerocp_stats.align_ok++; 878 cow_send = socow_setup(m, uio); 879 len = cow_send; 880 } 881 if (!cow_send) { 882 m_clget(m, M_WAITOK); 883 len = min(min(MCLBYTES, resid), space); 884* } 885#else /* ZERO_COPY_SOCKETS / 886* if (top == NULL) { 887 m = m_getcl(M_WAIT, MT_DATA, M_PKTHDR); 888 m->m_pkthdr.len = 0; 889 m->m_pkthdr.rcvif = NULL; 890 } else 891 m = m_getcl(M_WAIT, MT_DATA, 0); 892 len = min(min(MCLBYTES, resid), space); 893#endif / ZERO_COPY_SOCKETS / 894* } else { 895 if (top == NULL) { 896 m = m_gethdr(M_WAIT, MT_DATA); 897 m->m_pkthdr.len = 0; 898 m->m_pkthdr.rcvif = NULL; 899 900 len = min(min(MHLEN, resid), space); 901* /* 902 * For datagram protocols, leave room 903 * for protocol headers in first mbuf. 904 / 905* if (atomic && m && len < MHLEN) 906 MH_ALIGN(m, len); 907 } else { 908 m = m_get(M_WAIT, MT_DATA); 909 len = min(min(MLEN, resid), space); 910* } 911 } 912 if (m == NULL) { 913 error = ENOBUFS; 914 goto out; 915 } 916 917 space -= len; 918#ifdef ZERO_COPY_SOCKETS 919* if (cow_send) 920 error = 0; 921 else 922#endif /* ZERO_COPY_SOCKETS / 923* error = uiomove(mtod(m, void ), (int)len, uio); 924* resid = uio->uio_resid; 925 m->m_len = len; 926 mp = m; 927* top->m_pkthdr.len += len; 928 if (error) 929 goto out; 930 mp = &m->m_next; 931 if (resid <= 0) { 932 if (flags & MSG_EOR) 933 top->m_flags \|= M_EOR; 934 break; 935 } 936 } while (space > 0 && atomic); 937out: 938* retmp = top; 939* return (error); 940} 941#endif /ZERO_COPY_SOCKETS/ 942 943#define SBLOCKWAIT(f) (((f) & MSG_DONTWAIT) ? 0 : SBL_WAIT) 944 945int 946sosend_dgram(struct socket so, struct sockaddr addr, struct uio uio, 947* struct mbuf top, struct mbuf control, int flags, struct thread td) 948{ 949* long space, resid; 950 int clen = 0, error, dontroute; 951#ifdef ZERO_COPY_SOCKETS 952 int atomic = sosendallatonce(so) \|\| top; 953#endif 954 955 KASSERT(so->so_type == SOCK_DGRAM, ("sodgram_send: !SOCK_DGRAM")); 956 KASSERT(so->so_proto->pr_flags & PR_ATOMIC, 957 ("sodgram_send: !PR_ATOMIC")); 958 959 if (uio != NULL) 960 resid = uio->uio_resid; 961 else 962 resid = top->m_pkthdr.len; 963 /* 964 * In theory resid should be unsigned. However, space must be 965 * signed, as it might be less than 0 if we over-committed, and we 966 * must use a signed comparison of space and resid. On the other 967 * hand, a negative resid causes us to loop sending 0-length 968 * segments to the protocol. 969 * 970 * Also check to make sure that MSG_EOR isn't used on SOCK_STREAM 971 * type sockets since that's an error. 972 / 973* if (resid < 0) { 974 error = EINVAL; 975 goto out; 976 } 977 978 dontroute = 979 (flags & MSG_DONTROUTE) && (so->so_options & SO_DONTROUTE) == 0; 980 if (td != NULL) 981 td->td_ru.ru_msgsnd++; 982 if (control != NULL) 983 clen = control->m_len; 984 985 SOCKBUF_LOCK(&so->so_snd); 986 if (so->so_snd.sb_state & SBS_CANTSENDMORE) { 987 SOCKBUF_UNLOCK(&so->so_snd); 988 error = EPIPE; 989 goto out; 990 } 991 if (so->so_error) { 992 error = so->so_error; 993 so->so_error = 0; 994 SOCKBUF_UNLOCK(&so->so_snd); 995 goto out; 996 } 997 if ((so->so_state & SS_ISCONNECTED) == 0) { 998 /* 999 * `sendto' and `sendmsg' is allowed on a connection-based 1000 * socket if it supports implied connect. Return ENOTCONN if 1001 * not connected and no address is supplied. 1002 / 1003* if ((so->so_proto->pr_flags & PR_CONNREQUIRED) && 1004 (so->so_proto->pr_flags & PR_IMPLOPCL) == 0) { 1005 if ((so->so_state & SS_ISCONFIRMING) == 0 && 1006 !(resid == 0 && clen != 0)) { 1007 SOCKBUF_UNLOCK(&so->so_snd); 1008 error = ENOTCONN; 1009 goto out; 1010 } 1011 } else if (addr == NULL) { 1012 if (so->so_proto->pr_flags & PR_CONNREQUIRED) 1013 error = ENOTCONN; 1014 else 1015 error = EDESTADDRREQ; 1016 SOCKBUF_UNLOCK(&so->so_snd); 1017 goto out; 1018 } 1019 } 1020 1021 /* 1022 * Do we need MSG_OOB support in SOCK_DGRAM? Signs here may be a 1023 * problem and need fixing. 1024 / 1025* space = sbspace(&so->so_snd); 1026 if (flags & MSG_OOB) 1027 space += 1024; 1028 space -= clen; 1029 SOCKBUF_UNLOCK(&so->so_snd); 1030 if (resid > space) { 1031 error = EMSGSIZE; 1032 goto out; 1033 } 1034 if (uio == NULL) { 1035 resid = 0; 1036 if (flags & MSG_EOR) 1037 top->m_flags \|= M_EOR; 1038 } else { 1039#ifdef ZERO_COPY_SOCKETS 1040 error = sosend_copyin(uio, &top, atomic, &space, flags); 1041 if (error) 1042 goto out; 1043#else 1044 /* 1045 * Copy the data from userland into a mbuf chain. 1046 * If no data is to be copied in, a single empty mbuf 1047 * is returned. 1048 / 1049* top = m_uiotombuf(uio, M_WAITOK, space, max_hdr, 1050 (M_PKTHDR \| ((flags & MSG_EOR) ? M_EOR : 0))); 1051 if (top == NULL) { 1052 error = EFAULT; /* only possible error / 1053* goto out; 1054 } 1055 space -= resid - uio->uio_resid; 1056#endif 1057 resid = uio->uio_resid; 1058 } 1059 KASSERT(resid == 0, ("sosend_dgram: resid != 0")); 1060 /* 1061 * XXXRW: Frobbing SO_DONTROUTE here is even worse without sblock 1062 * than with. 1063 / 1064* if (dontroute) { 1065 SOCK_LOCK(so); 1066 so->so_options \|= SO_DONTROUTE; 1067 SOCK_UNLOCK(so); 1068 } 1069 /* 1070 * XXX all the SBS_CANTSENDMORE checks previously done could be out 1071 * of date. We could have recieved a reset packet in an interrupt or 1072 * maybe we slept while doing page faults in uiomove() etc. We could 1073 * probably recheck again inside the locking protection here, but 1074 * there are probably other places that this also happens. We must 1075 * rethink this. 1076 / 1077* error = (so->so_proto->pr_usrreqs->pru_send)(so, 1078* (flags & MSG_OOB) ? PRUS_OOB : 1079 /* 1080 * If the user set MSG_EOF, the protocol understands this flag and 1081 * nothing left to send then use PRU_SEND_EOF instead of PRU_SEND. 1082 / 1083* ((flags & MSG_EOF) && 1084 (so->so_proto->pr_flags & PR_IMPLOPCL) && 1085 (resid <= 0)) ? 1086 PRUS_EOF : 1087 /* If there is more to send set PRUS_MORETOCOME / 1088* (resid > 0 && space > 0) ? PRUS_MORETOCOME : 0, 1089 top, addr, control, td); 1090 if (dontroute) { 1091 SOCK_LOCK(so); 1092 so->so_options &= ~SO_DONTROUTE; 1093 SOCK_UNLOCK(so); 1094 } 1095 clen = 0; 1096 control = NULL; 1097 top = NULL; 1098out: 1099 if (top != NULL) 1100 m_freem(top); 1101 if (control != NULL) 1102 m_freem(control); 1103 return (error); 1104} 1105 1106/* 1107 * Send on a socket. If send must go all at once and message is larger than 1108 * send buffering, then hard error. Lock against other senders. If must go 1109 * all at once and not enough room now, then inform user that this would 1110 * block and do nothing. Otherwise, if nonblocking, send as much as 1111 * possible. The data to be sent is described by "uio" if nonzero, otherwise 1112 * by the mbuf chain "top" (which must be null if uio is not). Data provided 1113 * in mbuf chain must be small enough to send all at once. 1114 * 1115 * Returns nonzero on error, timeout or signal; callers must check for short 1116 * counts if EINTR/ERESTART are returned. Data and control buffers are freed 1117 * on return. 1118 / 1119int 1120sosend_generic(struct socket so, struct sockaddr addr, struct uio uio, 1121 struct mbuf top, struct mbuf control, int flags, struct thread td) 1122{ 1123* long space, resid; 1124 int clen = 0, error, dontroute; 1125 int atomic = sosendallatonce(so) \|\| top; 1126 1127 if (uio != NULL) 1128 resid = uio->uio_resid; 1129 else 1130 resid = top->m_pkthdr.len; 1131 /* 1132 * In theory resid should be unsigned. However, space must be 1133 * signed, as it might be less than 0 if we over-committed, and we 1134 * must use a signed comparison of space and resid. On the other 1135 * hand, a negative resid causes us to loop sending 0-length 1136 * segments to the protocol. 1137 * 1138 * Also check to make sure that MSG_EOR isn't used on SOCK_STREAM 1139 * type sockets since that's an error. 1140 / 1141* if (resid < 0 \|\| (so->so_type == SOCK_STREAM && (flags & MSG_EOR))) { 1142 error = EINVAL; 1143 goto out; 1144 } 1145 1146 dontroute = 1147 (flags & MSG_DONTROUTE) && (so->so_options & SO_DONTROUTE) == 0 && 1148 (so->so_proto->pr_flags & PR_ATOMIC); 1149 if (td != NULL) 1150 td->td_ru.ru_msgsnd++; 1151 if (control != NULL) 1152 clen = control->m_len; 1153 1154 error = sblock(&so->so_snd, SBLOCKWAIT(flags)); 1155 if (error) 1156 goto out; 1157 1158restart: 1159 do { 1160 SOCKBUF_LOCK(&so->so_snd); 1161 if (so->so_snd.sb_state & SBS_CANTSENDMORE) { 1162 SOCKBUF_UNLOCK(&so->so_snd); 1163 error = EPIPE; 1164 goto release; 1165 } 1166 if (so->so_error) { 1167 error = so->so_error; 1168 so->so_error = 0; 1169 SOCKBUF_UNLOCK(&so->so_snd); 1170 goto release; 1171 } 1172 if ((so->so_state & SS_ISCONNECTED) == 0) { 1173 /* 1174 * `sendto' and `sendmsg' is allowed on a connection- 1175 * based socket if it supports implied connect. 1176 * Return ENOTCONN if not connected and no address is 1177 * supplied. 1178 / 1179* if ((so->so_proto->pr_flags & PR_CONNREQUIRED) && 1180 (so->so_proto->pr_flags & PR_IMPLOPCL) == 0) { 1181 if ((so->so_state & SS_ISCONFIRMING) == 0 && 1182 !(resid == 0 && clen != 0)) { 1183 SOCKBUF_UNLOCK(&so->so_snd); 1184 error = ENOTCONN; 1185 goto release; 1186 } 1187 } else if (addr == NULL) { 1188 SOCKBUF_UNLOCK(&so->so_snd); 1189 if (so->so_proto->pr_flags & PR_CONNREQUIRED) 1190 error = ENOTCONN; 1191 else 1192 error = EDESTADDRREQ; 1193 goto release; 1194 } 1195 } 1196 space = sbspace(&so->so_snd); 1197 if (flags & MSG_OOB) 1198 space += 1024; 1199 if ((atomic && resid > so->so_snd.sb_hiwat) \|\| 1200 clen > so->so_snd.sb_hiwat) { 1201 SOCKBUF_UNLOCK(&so->so_snd); 1202 error = EMSGSIZE; 1203 goto release; 1204 } 1205 if (space < resid + clen && 1206 (atomic \|\| space < so->so_snd.sb_lowat \|\| space < clen)) { 1207 if ((so->so_state & SS_NBIO) \|\| (flags & MSG_NBIO)) { 1208 SOCKBUF_UNLOCK(&so->so_snd); 1209 error = EWOULDBLOCK; 1210 goto release; 1211 } 1212 error = sbwait(&so->so_snd); 1213 SOCKBUF_UNLOCK(&so->so_snd); 1214 if (error) 1215 goto release; 1216 goto restart; 1217 } 1218 SOCKBUF_UNLOCK(&so->so_snd); 1219 space -= clen; 1220 do { 1221 if (uio == NULL) { 1222 resid = 0; 1223 if (flags & MSG_EOR) 1224 top->m_flags \|= M_EOR; 1225 } else { 1226#ifdef ZERO_COPY_SOCKETS 1227 error = sosend_copyin(uio, &top, atomic, 1228 &space, flags); 1229 if (error != 0) 1230 goto release; 1231#else 1232 /* 1233 * Copy the data from userland into a mbuf 1234 * chain. If no data is to be copied in, 1235 * a single empty mbuf is returned. 1236 / 1237* top = m_uiotombuf(uio, M_WAITOK, space, 1238 (atomic ? max_hdr : 0), 1239 (atomic ? M_PKTHDR : 0) \| 1240 ((flags & MSG_EOR) ? M_EOR : 0)); 1241 if (top == NULL) { 1242 error = EFAULT; /* only possible error / 1243* goto release; 1244 } 1245 space -= resid - uio->uio_resid; 1246#endif 1247 resid = uio->uio_resid; 1248 } 1249 if (dontroute) { 1250 SOCK_LOCK(so); 1251 so->so_options \|= SO_DONTROUTE; 1252 SOCK_UNLOCK(so); 1253 } 1254 /* 1255 * XXX all the SBS_CANTSENDMORE checks previously 1256 * done could be out of date. We could have recieved 1257 * a reset packet in an interrupt or maybe we slept 1258 * while doing page faults in uiomove() etc. We 1259 * could probably recheck again inside the locking 1260 * protection here, but there are probably other 1261 * places that this also happens. We must rethink 1262 * this. 1263 / 1264* error = (so->so_proto->pr_usrreqs->pru_send)(so, 1265* (flags & MSG_OOB) ? PRUS_OOB : 1266 /* 1267 * If the user set MSG_EOF, the protocol understands 1268 * this flag and nothing left to send then use 1269 * PRU_SEND_EOF instead of PRU_SEND. 1270 / 1271* ((flags & MSG_EOF) && 1272 (so->so_proto->pr_flags & PR_IMPLOPCL) && 1273 (resid <= 0)) ? 1274 PRUS_EOF : 1275 /* If there is more to send set PRUS_MORETOCOME. / 1276* (resid > 0 && space > 0) ? PRUS_MORETOCOME : 0, 1277 top, addr, control, td); 1278 if (dontroute) { 1279 SOCK_LOCK(so); 1280 so->so_options &= ~SO_DONTROUTE; 1281 SOCK_UNLOCK(so); 1282 } 1283 clen = 0; 1284 control = NULL; 1285 top = NULL; 1286 if (error) 1287 goto release; 1288 } while (resid && space > 0); 1289 } while (resid); 1290 1291release: 1292 sbunlock(&so->so_snd); 1293out: 1294 if (top != NULL) 1295 m_freem(top); 1296 if (control != NULL) 1297 m_freem(control); 1298 return (error); 1299} 1300 1301int 1302sosend(struct socket so, struct sockaddr addr, struct uio uio, 1303* struct mbuf top, struct mbuf control, int flags, struct thread td) 1304{ 1305* int error; 1306 1307 CURVNET_SET(so->so_vnet); 1308 error = so->so_proto->pr_usrreqs->pru_sosend(so, addr, uio, top, 1309 control, flags, td); 1310 CURVNET_RESTORE(); 1311 return (error); 1312} 1313 1314/* 1315 * The part of soreceive() that implements reading non-inline out-of-band 1316 * data from a socket. For more complete comments, see soreceive(), from 1317 * which this code originated. 1318 * 1319 * Note that soreceive_rcvoob(), unlike the remainder of soreceive(), is 1320 * unable to return an mbuf chain to the caller. 1321 / 1322static int 1323soreceive_rcvoob(struct socket so, struct uio uio, int flags) 1324{ 1325* struct protosw pr = so->so_proto; 1326* struct mbuf m; 1327* int error; 1328 1329 KASSERT(flags & MSG_OOB, ("soreceive_rcvoob: (flags & MSG_OOB) == 0")); 1330 1331 m = m_get(M_WAIT, MT_DATA); 1332 error = (pr->pr_usrreqs->pru_rcvoob)(so, m, flags & MSG_PEEK); 1333* if (error) 1334 goto bad; 1335 do { 1336#ifdef ZERO_COPY_SOCKETS 1337 if (so_zero_copy_receive) { 1338 int disposable; 1339 1340 if ((m->m_flags & M_EXT) 1341 && (m->m_ext.ext_type == EXT_DISPOSABLE)) 1342 disposable = 1; 1343 else 1344 disposable = 0; 1345 1346 error = uiomoveco(mtod(m, void ), 1347* min(uio->uio_resid, m->m_len), 1348 uio, disposable); 1349 } else 1350#endif /* ZERO_COPY_SOCKETS / 1351* error = uiomove(mtod(m, void ), 1352* (int) min(uio->uio_resid, m->m_len), uio); 1353 m = m_free(m); 1354 } while (uio->uio_resid && error == 0 && m); 1355bad: 1356 if (m != NULL) 1357 m_freem(m); 1358 return (error); 1359} 1360 1361/* 1362 * Following replacement or removal of the first mbuf on the first mbuf chain 1363 * of a socket buffer, push necessary state changes back into the socket 1364 * buffer so that other consumers see the values consistently. 'nextrecord' 1365 * is the callers locally stored value of the original value of 1366 * sb->sb_mb->m_nextpkt which must be restored when the lead mbuf changes. 1367 * NOTE: 'nextrecord' may be NULL. 1368 / 1369static __inline void 1370sockbuf_pushsync(struct sockbuf sb, struct mbuf nextrecord) 1371{ 1372* 1373 SOCKBUF_LOCK_ASSERT(sb); 1374 /* 1375 * First, update for the new value of nextrecord. If necessary, make 1376 * it the first record. 1377 / 1378* if (sb->sb_mb != NULL) 1379 sb->sb_mb->m_nextpkt = nextrecord; 1380 else 1381 sb->sb_mb = nextrecord; 1382 1383 /* 1384 * Now update any dependent socket buffer fields to reflect the new 1385 * state. This is an expanded inline of SB_EMPTY_FIXUP(), with the 1386 * addition of a second clause that takes care of the case where 1387 * sb_mb has been updated, but remains the last record. 1388 / 1389* if (sb->sb_mb == NULL) { 1390 sb->sb_mbtail = NULL; 1391 sb->sb_lastrecord = NULL; 1392 } else if (sb->sb_mb->m_nextpkt == NULL) 1393 sb->sb_lastrecord = sb->sb_mb; 1394} 1395 1396 1397/* 1398 * Implement receive operations on a socket. We depend on the way that 1399 * records are added to the sockbuf by sbappend. In particular, each record 1400 * (mbufs linked through m_next) must begin with an address if the protocol 1401 * so specifies, followed by an optional mbuf or mbufs containing ancillary 1402 * data, and then zero or more mbufs of data. In order to allow parallelism 1403 * between network receive and copying to user space, as well as avoid 1404 * sleeping with a mutex held, we release the socket buffer mutex during the 1405 * user space copy. Although the sockbuf is locked, new data may still be 1406 * appended, and thus we must maintain consistency of the sockbuf during that 1407 * time. 1408 * 1409 * The caller may receive the data as a single mbuf chain by supplying an 1410 * mbuf *mp0 for use in returning the chain. The uio is then used only for 1411* * the count in uio_resid. 1412 / 1413int 1414soreceive_generic(struct socket so, struct sockaddr *psa, struct uio uio, 1415 struct mbuf mp0, struct mbuf controlp, int flagsp) 1416{ 1417* struct mbuf m, mp; 1418* int flags, len, error, offset; 1419 struct protosw pr = so->so_proto; 1420* struct mbuf nextrecord; 1421* int moff, type = 0; 1422 int orig_resid = uio->uio_resid; 1423 1424 mp = mp0; 1425 if (psa != NULL) 1426 psa = NULL; 1427* if (controlp != NULL) 1428 controlp = NULL; 1429* if (flagsp != NULL) 1430 flags = flagsp &~ MSG_EOR; 1431* else 1432 flags = 0; 1433 if (flags & MSG_OOB) 1434 return (soreceive_rcvoob(so, uio, flags)); 1435 if (mp != NULL) 1436 mp = NULL; 1437* if ((pr->pr_flags & PR_WANTRCVD) && (so->so_state & SS_ISCONFIRMING) 1438 && uio->uio_resid) 1439 (pr->pr_usrreqs->pru_rcvd)(so, 0); 1440* 1441 error = sblock(&so->so_rcv, SBLOCKWAIT(flags)); 1442 if (error) 1443 return (error); 1444 1445restart: 1446 SOCKBUF_LOCK(&so->so_rcv); 1447 m = so->so_rcv.sb_mb; 1448 /* 1449 * If we have less data than requested, block awaiting more (subject 1450 * to any timeout) if: 1451 * 1. the current count is less than the low water mark, or 1452 * 2. MSG_WAITALL is set, and it is possible to do the entire 1453 * receive operation at once if we block (resid <= hiwat). 1454 * 3. MSG_DONTWAIT is not set 1455 * If MSG_WAITALL is set but resid is larger than the receive buffer, 1456 * we have to do the receive in sections, and thus risk returning a 1457 * short count if a timeout or signal occurs after we start. 1458 / 1459* if (m == NULL \|\| (((flags & MSG_DONTWAIT) == 0 && 1460 so->so_rcv.sb_cc < uio->uio_resid) && 1461 (so->so_rcv.sb_cc < so->so_rcv.sb_lowat \|\| 1462 ((flags & MSG_WAITALL) && uio->uio_resid <= so->so_rcv.sb_hiwat)) && 1463 m->m_nextpkt == NULL && (pr->pr_flags & PR_ATOMIC) == 0)) { 1464 KASSERT(m != NULL \|\| !so->so_rcv.sb_cc, 1465 ("receive: m == %p so->so_rcv.sb_cc == %u", 1466 m, so->so_rcv.sb_cc)); 1467 if (so->so_error) { 1468 if (m != NULL) 1469 goto dontblock; 1470 error = so->so_error; 1471 if ((flags & MSG_PEEK) == 0) 1472 so->so_error = 0; 1473 SOCKBUF_UNLOCK(&so->so_rcv); 1474 goto release; 1475 } 1476 SOCKBUF_LOCK_ASSERT(&so->so_rcv); 1477 if (so->so_rcv.sb_state & SBS_CANTRCVMORE) { 1478 if (m == NULL) { 1479 SOCKBUF_UNLOCK(&so->so_rcv); 1480 goto release; 1481 } else 1482 goto dontblock; 1483 } 1484 for (; m != NULL; m = m->m_next) 1485 if (m->m_type == MT_OOBDATA \|\| (m->m_flags & M_EOR)) { 1486 m = so->so_rcv.sb_mb; 1487 goto dontblock; 1488 } 1489 if ((so->so_state & (SS_ISCONNECTED\|SS_ISCONNECTING)) == 0 && 1490 (so->so_proto->pr_flags & PR_CONNREQUIRED)) { 1491 SOCKBUF_UNLOCK(&so->so_rcv); 1492 error = ENOTCONN; 1493 goto release; 1494 } 1495 if (uio->uio_resid == 0) { 1496 SOCKBUF_UNLOCK(&so->so_rcv); 1497 goto release; 1498 } 1499 if ((so->so_state & SS_NBIO) \|\| 1500 (flags & (MSG_DONTWAIT\|MSG_NBIO))) { 1501 SOCKBUF_UNLOCK(&so->so_rcv); 1502 error = EWOULDBLOCK; 1503 goto release; 1504 } 1505 SBLASTRECORDCHK(&so->so_rcv); 1506 SBLASTMBUFCHK(&so->so_rcv); 1507 error = sbwait(&so->so_rcv); 1508 SOCKBUF_UNLOCK(&so->so_rcv); 1509 if (error) 1510 goto release; 1511 goto restart; 1512 } 1513dontblock: 1514 /* 1515 * From this point onward, we maintain 'nextrecord' as a cache of the 1516 * pointer to the next record in the socket buffer. We must keep the 1517 * various socket buffer pointers and local stack versions of the 1518 * pointers in sync, pushing out modifications before dropping the 1519 * socket buffer mutex, and re-reading them when picking it up. 1520 * 1521 * Otherwise, we will race with the network stack appending new data 1522 * or records onto the socket buffer by using inconsistent/stale 1523 * versions of the field, possibly resulting in socket buffer 1524 * corruption. 1525 * 1526 * By holding the high-level sblock(), we prevent simultaneous 1527 * readers from pulling off the front of the socket buffer. 1528 / 1529* SOCKBUF_LOCK_ASSERT(&so->so_rcv); 1530 if (uio->uio_td) 1531 uio->uio_td->td_ru.ru_msgrcv++; 1532 KASSERT(m == so->so_rcv.sb_mb, ("soreceive: m != so->so_rcv.sb_mb")); 1533 SBLASTRECORDCHK(&so->so_rcv); 1534 SBLASTMBUFCHK(&so->so_rcv); 1535 nextrecord = m->m_nextpkt; 1536 if (pr->pr_flags & PR_ADDR) { 1537 KASSERT(m->m_type == MT_SONAME, 1538 ("m->m_type == %d", m->m_type)); 1539 orig_resid = 0; 1540 if (psa != NULL) 1541 psa = sodupsockaddr(mtod(m, struct sockaddr ), 1542 M_NOWAIT); 1543 if (flags & MSG_PEEK) { 1544 m = m->m_next; 1545 } else { 1546 sbfree(&so->so_rcv, m); 1547 so->so_rcv.sb_mb = m_free(m); 1548 m = so->so_rcv.sb_mb; 1549 sockbuf_pushsync(&so->so_rcv, nextrecord); 1550 } 1551 } 1552 1553 /* 1554 * Process one or more MT_CONTROL mbufs present before any data mbufs 1555 * in the first mbuf chain on the socket buffer. If MSG_PEEK, we 1556 * just copy the data; if !MSG_PEEK, we call into the protocol to 1557 * perform externalization (or freeing if controlp == NULL). 1558 / 1559* if (m != NULL && m->m_type == MT_CONTROL) { 1560 struct mbuf cm = NULL, cmn; 1561 struct mbuf *cme = &cm; 1562* 1563 do { 1564 if (flags & MSG_PEEK) { 1565 if (controlp != NULL) { 1566 controlp = m_copy(m, 0, m->m_len); 1567* controlp = &(controlp)->m_next; 1568* } 1569 m = m->m_next; 1570 } else { 1571 sbfree(&so->so_rcv, m); 1572 so->so_rcv.sb_mb = m->m_next; 1573 m->m_next = NULL; 1574 cme = m; 1575* cme = &(cme)->m_next; 1576* m = so->so_rcv.sb_mb; 1577 } 1578 } while (m != NULL && m->m_type == MT_CONTROL); 1579 if ((flags & MSG_PEEK) == 0) 1580 sockbuf_pushsync(&so->so_rcv, nextrecord); 1581 while (cm != NULL) { 1582 cmn = cm->m_next; 1583 cm->m_next = NULL; 1584 if (pr->pr_domain->dom_externalize != NULL) { 1585 SOCKBUF_UNLOCK(&so->so_rcv); 1586 error = (pr->pr_domain->dom_externalize) 1587* (cm, controlp); 1588 SOCKBUF_LOCK(&so->so_rcv); 1589 } else if (controlp != NULL) 1590 controlp = cm; 1591* else 1592 m_freem(cm); 1593 if (controlp != NULL) { 1594 orig_resid = 0; 1595 while (controlp != NULL) 1596* controlp = &(controlp)->m_next; 1597* } 1598 cm = cmn; 1599 } 1600 if (m != NULL) 1601 nextrecord = so->so_rcv.sb_mb->m_nextpkt; 1602 else 1603 nextrecord = so->so_rcv.sb_mb; 1604 orig_resid = 0; 1605 } 1606 if (m != NULL) { 1607 if ((flags & MSG_PEEK) == 0) { 1608 KASSERT(m->m_nextpkt == nextrecord, 1609 ("soreceive: post-control, nextrecord !sync")); 1610 if (nextrecord == NULL) { 1611 KASSERT(so->so_rcv.sb_mb == m, 1612 ("soreceive: post-control, sb_mb!=m")); 1613 KASSERT(so->so_rcv.sb_lastrecord == m, 1614 ("soreceive: post-control, lastrecord!=m")); 1615 } 1616 } 1617 type = m->m_type; 1618 if (type == MT_OOBDATA) 1619 flags \|= MSG_OOB; 1620 } else { 1621 if ((flags & MSG_PEEK) == 0) { 1622 KASSERT(so->so_rcv.sb_mb == nextrecord, 1623 ("soreceive: sb_mb != nextrecord")); 1624 if (so->so_rcv.sb_mb == NULL) { 1625 KASSERT(so->so_rcv.sb_lastrecord == NULL, 1626 ("soreceive: sb_lastercord != NULL")); 1627 } 1628 } 1629 } 1630 SOCKBUF_LOCK_ASSERT(&so->so_rcv); 1631 SBLASTRECORDCHK(&so->so_rcv); 1632 SBLASTMBUFCHK(&so->so_rcv); 1633 1634 /* 1635 * Now continue to read any data mbufs off of the head of the socket 1636 * buffer until the read request is satisfied. Note that 'type' is 1637 * used to store the type of any mbuf reads that have happened so far 1638 * such that soreceive() can stop reading if the type changes, which 1639 * causes soreceive() to return only one of regular data and inline 1640 * out-of-band data in a single socket receive operation. 1641 / 1642* moff = 0; 1643 offset = 0; 1644 while (m != NULL && uio->uio_resid > 0 && error == 0) { 1645 /* 1646 * If the type of mbuf has changed since the last mbuf 1647 * examined ('type'), end the receive operation. 1648 / 1649* SOCKBUF_LOCK_ASSERT(&so->so_rcv); 1650 if (m->m_type == MT_OOBDATA) { 1651 if (type != MT_OOBDATA) 1652 break; 1653 } else if (type == MT_OOBDATA) 1654 break; 1655 else 1656 KASSERT(m->m_type == MT_DATA, 1657 ("m->m_type == %d", m->m_type)); 1658 so->so_rcv.sb_state &= ~SBS_RCVATMARK; 1659 len = uio->uio_resid; 1660 if (so->so_oobmark && len > so->so_oobmark - offset) 1661 len = so->so_oobmark - offset; 1662 if (len > m->m_len - moff) 1663 len = m->m_len - moff; 1664 /* 1665 * If mp is set, just pass back the mbufs. Otherwise copy 1666 * them out via the uio, then free. Sockbuf must be 1667 * consistent here (points to current mbuf, it points to next 1668 * record) when we drop priority; we must note any additions 1669 * to the sockbuf when we block interrupts again. 1670 / 1671* if (mp == NULL) { 1672 SOCKBUF_LOCK_ASSERT(&so->so_rcv); 1673 SBLASTRECORDCHK(&so->so_rcv); 1674 SBLASTMBUFCHK(&so->so_rcv); 1675 SOCKBUF_UNLOCK(&so->so_rcv); 1676#ifdef ZERO_COPY_SOCKETS 1677 if (so_zero_copy_receive) { 1678 int disposable; 1679 1680 if ((m->m_flags & M_EXT) 1681 && (m->m_ext.ext_type == EXT_DISPOSABLE)) 1682 disposable = 1; 1683 else 1684 disposable = 0; 1685 1686 error = uiomoveco(mtod(m, char ) + moff, 1687* (int)len, uio, 1688 disposable); 1689 } else 1690#endif /* ZERO_COPY_SOCKETS / 1691* error = uiomove(mtod(m, char ) + moff, (int)len, uio); 1692* SOCKBUF_LOCK(&so->so_rcv); 1693 if (error) { 1694 /* 1695 * The MT_SONAME mbuf has already been removed 1696 * from the record, so it is necessary to 1697 * remove the data mbufs, if any, to preserve 1698 * the invariant in the case of PR_ADDR that 1699 * requires MT_SONAME mbufs at the head of 1700 * each record. 1701 / 1702* if (m && pr->pr_flags & PR_ATOMIC && 1703 ((flags & MSG_PEEK) == 0)) 1704 (void)sbdroprecord_locked(&so->so_rcv); 1705 SOCKBUF_UNLOCK(&so->so_rcv); 1706 goto release; 1707 } 1708 } else 1709 uio->uio_resid -= len; 1710 SOCKBUF_LOCK_ASSERT(&so->so_rcv); 1711 if (len == m->m_len - moff) { 1712 if (m->m_flags & M_EOR) 1713 flags \|= MSG_EOR; 1714 if (flags & MSG_PEEK) { 1715 m = m->m_next; 1716 moff = 0; 1717 } else { 1718 nextrecord = m->m_nextpkt; 1719 sbfree(&so->so_rcv, m); 1720 if (mp != NULL) { 1721 mp = m; 1722* mp = &m->m_next; 1723 so->so_rcv.sb_mb = m = m->m_next; 1724 mp = NULL; 1725* } else { 1726 so->so_rcv.sb_mb = m_free(m); 1727 m = so->so_rcv.sb_mb; 1728 } 1729 sockbuf_pushsync(&so->so_rcv, nextrecord); 1730 SBLASTRECORDCHK(&so->so_rcv); 1731 SBLASTMBUFCHK(&so->so_rcv); 1732 } 1733 } else { 1734 if (flags & MSG_PEEK) 1735 moff += len; 1736 else { 1737 if (mp != NULL) { 1738 int copy_flag; 1739 1740 if (flags & MSG_DONTWAIT) 1741 copy_flag = M_DONTWAIT; 1742 else 1743 copy_flag = M_WAIT; 1744 if (copy_flag == M_WAIT) 1745 SOCKBUF_UNLOCK(&so->so_rcv); 1746 mp = m_copym(m, 0, len, copy_flag); 1747* if (copy_flag == M_WAIT) 1748 SOCKBUF_LOCK(&so->so_rcv); 1749 if (mp == NULL) { 1750* /* 1751 * m_copym() couldn't 1752 * allocate an mbuf. Adjust 1753 * uio_resid back (it was 1754 * adjusted down by len 1755 * bytes, which we didn't end 1756 * up "copying" over). 1757 / 1758* uio->uio_resid += len; 1759 break; 1760 } 1761 } 1762 m->m_data += len; 1763 m->m_len -= len; 1764 so->so_rcv.sb_cc -= len; 1765 } 1766 } 1767 SOCKBUF_LOCK_ASSERT(&so->so_rcv); 1768 if (so->so_oobmark) { 1769 if ((flags & MSG_PEEK) == 0) { 1770 so->so_oobmark -= len; 1771 if (so->so_oobmark == 0) { 1772 so->so_rcv.sb_state \|= SBS_RCVATMARK; 1773 break; 1774 } 1775 } else { 1776 offset += len; 1777 if (offset == so->so_oobmark) 1778 break; 1779 } 1780 } 1781 if (flags & MSG_EOR) 1782 break; 1783 /* 1784 * If the MSG_WAITALL flag is set (for non-atomic socket), we 1785 * must not quit until "uio->uio_resid == 0" or an error 1786 * termination. If a signal/timeout occurs, return with a 1787 * short count but without error. Keep sockbuf locked 1788 * against other readers. 1789 / 1790* while (flags & MSG_WAITALL && m == NULL && uio->uio_resid > 0 && 1791 !sosendallatonce(so) && nextrecord == NULL) { 1792 SOCKBUF_LOCK_ASSERT(&so->so_rcv); 1793 if (so->so_error \|\| so->so_rcv.sb_state & SBS_CANTRCVMORE) 1794 break; 1795 /* 1796 * Notify the protocol that some data has been 1797 * drained before blocking. 1798 / 1799* if (pr->pr_flags & PR_WANTRCVD) { 1800 SOCKBUF_UNLOCK(&so->so_rcv); 1801 (pr->pr_usrreqs->pru_rcvd)(so, flags); 1802* SOCKBUF_LOCK(&so->so_rcv); 1803 } 1804 SBLASTRECORDCHK(&so->so_rcv); 1805 SBLASTMBUFCHK(&so->so_rcv); 1806 error = sbwait(&so->so_rcv); 1807 if (error) { 1808 SOCKBUF_UNLOCK(&so->so_rcv); 1809 goto release; 1810 } 1811 m = so->so_rcv.sb_mb; 1812 if (m != NULL) 1813 nextrecord = m->m_nextpkt; 1814 } 1815 } 1816 1817 SOCKBUF_LOCK_ASSERT(&so->so_rcv); 1818 if (m != NULL && pr->pr_flags & PR_ATOMIC) { 1819 flags \|= MSG_TRUNC; 1820 if ((flags & MSG_PEEK) == 0) 1821 (void) sbdroprecord_locked(&so->so_rcv); 1822 } 1823 if ((flags & MSG_PEEK) == 0) { 1824 if (m == NULL) { 1825 /* 1826 * First part is an inline SB_EMPTY_FIXUP(). Second 1827 * part makes sure sb_lastrecord is up-to-date if 1828 * there is still data in the socket buffer. 1829 / 1830* so->so_rcv.sb_mb = nextrecord; 1831 if (so->so_rcv.sb_mb == NULL) { 1832 so->so_rcv.sb_mbtail = NULL; 1833 so->so_rcv.sb_lastrecord = NULL; 1834 } else if (nextrecord->m_nextpkt == NULL) 1835 so->so_rcv.sb_lastrecord = nextrecord; 1836 } 1837 SBLASTRECORDCHK(&so->so_rcv); 1838 SBLASTMBUFCHK(&so->so_rcv); 1839 /* 1840 * If soreceive() is being done from the socket callback, 1841 * then don't need to generate ACK to peer to update window, 1842 * since ACK will be generated on return to TCP. 1843 / 1844* if (!(flags & MSG_SOCALLBCK) && 1845 (pr->pr_flags & PR_WANTRCVD)) { 1846 SOCKBUF_UNLOCK(&so->so_rcv); 1847 (pr->pr_usrreqs->pru_rcvd)(so, flags); 1848* SOCKBUF_LOCK(&so->so_rcv); 1849 } 1850 } 1851 SOCKBUF_LOCK_ASSERT(&so->so_rcv); 1852 if (orig_resid == uio->uio_resid && orig_resid && 1853 (flags & MSG_EOR) == 0 && (so->so_rcv.sb_state & SBS_CANTRCVMORE) == 0) { 1854 SOCKBUF_UNLOCK(&so->so_rcv); 1855 goto restart; 1856 } 1857 SOCKBUF_UNLOCK(&so->so_rcv); 1858 1859 if (flagsp != NULL) 1860 flagsp \|= flags; 1861release: 1862* sbunlock(&so->so_rcv); 1863 return (error); 1864} 1865 1866/* 1867 * Optimized version of soreceive() for simple datagram cases from userspace. 1868 * Unlike in the stream case, we're able to drop a datagram if copyout() 1869 * fails, and because we handle datagrams atomically, we don't need to use a 1870 * sleep lock to prevent I/O interlacing. 1871 / 1872int 1873soreceive_dgram(struct socket so, struct sockaddr *psa, struct uio uio, 1874 struct mbuf mp0, struct mbuf controlp, int flagsp) 1875{ 1876* struct mbuf m, m2; 1877 int flags, len, error; 1878 struct protosw pr = so->so_proto; 1879* struct mbuf nextrecord; 1880* 1881 if (psa != NULL) 1882 psa = NULL; 1883* if (controlp != NULL) 1884 controlp = NULL; 1885* if (flagsp != NULL) 1886 flags = flagsp &~ MSG_EOR; 1887* else 1888 flags = 0; 1889 1890 /* 1891 * For any complicated cases, fall back to the full 1892 * soreceive_generic(). 1893 / 1894* if (mp0 != NULL \|\| (flags & MSG_PEEK) \|\| (flags & MSG_OOB)) 1895 return (soreceive_generic(so, psa, uio, mp0, controlp, 1896 flagsp)); 1897 1898 /* 1899 * Enforce restrictions on use. 1900 / 1901* KASSERT((pr->pr_flags & PR_WANTRCVD) == 0, 1902 ("soreceive_dgram: wantrcvd")); 1903 KASSERT(pr->pr_flags & PR_ATOMIC, ("soreceive_dgram: !atomic")); 1904 KASSERT((so->so_rcv.sb_state & SBS_RCVATMARK) == 0, 1905 ("soreceive_dgram: SBS_RCVATMARK")); 1906 KASSERT((so->so_proto->pr_flags & PR_CONNREQUIRED) == 0, 1907 ("soreceive_dgram: P_CONNREQUIRED")); 1908 1909 /* 1910 * Loop blocking while waiting for a datagram. 1911 / 1912* SOCKBUF_LOCK(&so->so_rcv); 1913 while ((m = so->so_rcv.sb_mb) == NULL) { 1914 KASSERT(so->so_rcv.sb_cc == 0, 1915 ("soreceive_dgram: sb_mb NULL but sb_cc %u", 1916 so->so_rcv.sb_cc)); 1917 if (so->so_error) { 1918 error = so->so_error; 1919 so->so_error = 0; 1920 SOCKBUF_UNLOCK(&so->so_rcv); 1921 return (error); 1922 } 1923 if (so->so_rcv.sb_state & SBS_CANTRCVMORE \|\| 1924 uio->uio_resid == 0) { 1925 SOCKBUF_UNLOCK(&so->so_rcv); 1926 return (0); 1927 } 1928 if ((so->so_state & SS_NBIO) \|\| 1929 (flags & (MSG_DONTWAIT\|MSG_NBIO))) { 1930 SOCKBUF_UNLOCK(&so->so_rcv); 1931 return (EWOULDBLOCK); 1932 } 1933 SBLASTRECORDCHK(&so->so_rcv); 1934 SBLASTMBUFCHK(&so->so_rcv); 1935 error = sbwait(&so->so_rcv); 1936 if (error) { 1937 SOCKBUF_UNLOCK(&so->so_rcv); 1938 return (error); 1939 } 1940 } 1941 SOCKBUF_LOCK_ASSERT(&so->so_rcv); 1942 1943 if (uio->uio_td) 1944 uio->uio_td->td_ru.ru_msgrcv++; 1945 SBLASTRECORDCHK(&so->so_rcv); 1946 SBLASTMBUFCHK(&so->so_rcv); 1947 nextrecord = m->m_nextpkt; 1948 if (nextrecord == NULL) { 1949 KASSERT(so->so_rcv.sb_lastrecord == m, 1950 ("soreceive_dgram: lastrecord != m")); 1951 } 1952 1953 KASSERT(so->so_rcv.sb_mb->m_nextpkt == nextrecord, 1954 ("soreceive_dgram: m_nextpkt != nextrecord")); 1955 1956 /* 1957 * Pull 'm' and its chain off the front of the packet queue. 1958 / 1959* so->so_rcv.sb_mb = NULL; 1960 sockbuf_pushsync(&so->so_rcv, nextrecord); 1961 1962 /* 1963 * Walk 'm's chain and free that many bytes from the socket buffer. 1964 / 1965* for (m2 = m; m2 != NULL; m2 = m2->m_next) 1966 sbfree(&so->so_rcv, m2); 1967 1968 /* 1969 * Do a few last checks before we let go of the lock. 1970 / 1971* SBLASTRECORDCHK(&so->so_rcv); 1972 SBLASTMBUFCHK(&so->so_rcv); 1973 SOCKBUF_UNLOCK(&so->so_rcv); 1974 1975 if (pr->pr_flags & PR_ADDR) { 1976 KASSERT(m->m_type == MT_SONAME, 1977 ("m->m_type == %d", m->m_type)); 1978 if (psa != NULL) 1979 psa = sodupsockaddr(mtod(m, struct sockaddr ), 1980 M_NOWAIT); 1981 m = m_free(m); 1982 } 1983 if (m == NULL) { 1984 /* XXXRW: Can this happen? / 1985* return (0); 1986 } 1987 1988 /* 1989 * Packet to copyout() is now in 'm' and it is disconnected from the 1990 * queue. 1991 * 1992 * Process one or more MT_CONTROL mbufs present before any data mbufs 1993 * in the first mbuf chain on the socket buffer. We call into the 1994 * protocol to perform externalization (or freeing if controlp == 1995 * NULL). 1996 / 1997* if (m->m_type == MT_CONTROL) { 1998 struct mbuf cm = NULL, cmn; 1999 struct mbuf *cme = &cm; 2000* 2001 do { 2002 m2 = m->m_next; 2003 m->m_next = NULL; 2004 cme = m; 2005* cme = &(cme)->m_next; 2006* m = m2; 2007 } while (m != NULL && m->m_type == MT_CONTROL); 2008 while (cm != NULL) { 2009 cmn = cm->m_next; 2010 cm->m_next = NULL; 2011 if (pr->pr_domain->dom_externalize != NULL) { 2012 error = (pr->pr_domain->dom_externalize) 2013* (cm, controlp); 2014 } else if (controlp != NULL) 2015 controlp = cm; 2016* else 2017 m_freem(cm); 2018 if (controlp != NULL) { 2019 while (controlp != NULL) 2020* controlp = &(controlp)->m_next; 2021* } 2022 cm = cmn; 2023 } 2024 } 2025 KASSERT(m->m_type == MT_DATA, ("soreceive_dgram: !data")); 2026 2027 while (m != NULL && uio->uio_resid > 0) { 2028 len = uio->uio_resid; 2029 if (len > m->m_len) 2030 len = m->m_len; 2031 error = uiomove(mtod(m, char ), (int)len, uio); 2032* if (error) { 2033 m_freem(m); 2034 return (error); 2035 } 2036 m = m_free(m); 2037 } 2038 if (m != NULL) 2039 flags \|= MSG_TRUNC; 2040 m_freem(m); 2041 if (flagsp != NULL) 2042 flagsp \|= flags; 2043* return (0); 2044} 2045 2046int 2047soreceive(struct socket so, struct sockaddr psa, struct uio uio, 2048 struct mbuf mp0, struct mbuf controlp, int flagsp) 2049{ 2050* 2051 return (so->so_proto->pr_usrreqs->pru_soreceive(so, psa, uio, mp0, 2052 controlp, flagsp)); 2053} 2054 2055int 2056soshutdown(struct socket so, int how) 2057{ 2058* struct protosw pr = so->so_proto; 2059* int error; 2060 2061 if (!(how == SHUT_RD \|\| how == SHUT_WR \|\| how == SHUT_RDWR)) 2062 return (EINVAL); 2063 if (pr->pr_usrreqs->pru_flush != NULL) { 2064 (pr->pr_usrreqs->pru_flush)(so, how); 2065* } 2066 if (how != SHUT_WR) 2067 sorflush(so); 2068 if (how != SHUT_RD) { 2069 CURVNET_SET(so->so_vnet); 2070 error = (pr->pr_usrreqs->pru_shutdown)(so); 2071* CURVNET_RESTORE(); 2072 return (error); 2073 } 2074 return (0); 2075} 2076 2077void 2078sorflush(struct socket so) 2079{ 2080* struct sockbuf sb = &so->so_rcv; 2081* struct protosw pr = so->so_proto; 2082* struct sockbuf asb; 2083 2084 /* 2085 * In order to avoid calling dom_dispose with the socket buffer mutex 2086 * held, and in order to generally avoid holding the lock for a long 2087 * time, we make a copy of the socket buffer and clear the original 2088 * (except locks, state). The new socket buffer copy won't have 2089 * initialized locks so we can only call routines that won't use or 2090 * assert those locks. 2091 * 2092 * Dislodge threads currently blocked in receive and wait to acquire 2093 * a lock against other simultaneous readers before clearing the 2094 * socket buffer. Don't let our acquire be interrupted by a signal 2095 * despite any existing socket disposition on interruptable waiting. 2096 / 2097* CURVNET_SET(so->so_vnet); 2098 socantrcvmore(so); 2099 (void) sblock(sb, SBL_WAIT \| SBL_NOINTR); 2100 2101 /* 2102 * Invalidate/clear most of the sockbuf structure, but leave selinfo 2103 * and mutex data unchanged. 2104 / 2105* SOCKBUF_LOCK(sb); 2106 bzero(&asb, offsetof(struct sockbuf, sb_startzero)); 2107 bcopy(&sb->sb_startzero, &asb.sb_startzero, 2108 sizeof(sb) - offsetof(struct sockbuf, sb_startzero)); 2109* bzero(&sb->sb_startzero, 2110 sizeof(sb) - offsetof(struct sockbuf, sb_startzero)); 2111* SOCKBUF_UNLOCK(sb); 2112 sbunlock(sb); 2113 2114 /* 2115 * Dispose of special rights and flush the socket buffer. Don't call 2116 * any unsafe routines (that rely on locks being initialized) on asb. 2117 / 2118* if (pr->pr_flags & PR_RIGHTS && pr->pr_domain->dom_dispose != NULL) 2119 (pr->pr_domain->dom_dispose)(asb.sb_mb); 2120* sbrelease_internal(&asb, so); 2121 CURVNET_RESTORE(); 2122} 2123 2124/* 2125 * Perhaps this routine, and sooptcopyout(), below, ought to come in an 2126 * additional variant to handle the case where the option value needs to be 2127 * some kind of integer, but not a specific size. In addition to their use 2128 * here, these functions are also called by the protocol-level pr_ctloutput() 2129 * routines. 2130 / 2131int 2132sooptcopyin(struct sockopt sopt, void buf, size_t len, size_t minlen) 2133{ 2134* size_t valsize; 2135 2136 /* 2137 * If the user gives us more than we wanted, we ignore it, but if we 2138 * don't get the minimum length the caller wants, we return EINVAL. 2139 * On success, sopt->sopt_valsize is set to however much we actually 2140 * retrieved. 2141 / 2142* if ((valsize = sopt->sopt_valsize) < minlen) 2143 return EINVAL; 2144 if (valsize > len) 2145 sopt->sopt_valsize = valsize = len; 2146 2147 if (sopt->sopt_td != NULL) 2148 return (copyin(sopt->sopt_val, buf, valsize)); 2149 2150 bcopy(sopt->sopt_val, buf, valsize); 2151 return (0); 2152} 2153 2154/* 2155 * Kernel version of setsockopt(2). 2156 * 2157 * XXX: optlen is size_t, not socklen_t 2158 / 2159int 2160so_setsockopt(struct socket so, int level, int optname, void optval, 2161* size_t optlen) 2162{ 2163 struct sockopt sopt; 2164 2165 sopt.sopt_level = level; 2166 sopt.sopt_name = optname; 2167 sopt.sopt_dir = SOPT_SET; 2168 sopt.sopt_val = optval; 2169 sopt.sopt_valsize = optlen; 2170 sopt.sopt_td = NULL; 2171 return (sosetopt(so, &sopt)); 2172} 2173 2174int 2175sosetopt(struct socket so, struct sockopt sopt) 2176{ 2177 int error, optval; 2178 struct linger l; 2179 struct timeval tv; 2180 u_long val; 2181#ifdef MAC 2182 struct mac extmac; 2183#endif 2184 2185 error = 0; 2186 if (sopt->sopt_level != SOL_SOCKET) { 2187 if (so->so_proto && so->so_proto->pr_ctloutput) 2188 return ((so->so_proto->pr_ctloutput) 2189* (so, sopt)); 2190 error = ENOPROTOOPT; 2191 } else { 2192 switch (sopt->sopt_name) { 2193#ifdef INET 2194 case SO_ACCEPTFILTER: 2195 error = do_setopt_accept_filter(so, sopt); 2196 if (error) 2197 goto bad; 2198 break; 2199#endif 2200 case SO_LINGER: 2201 error = sooptcopyin(sopt, &l, sizeof l, sizeof l); 2202 if (error) 2203 goto bad; 2204 2205 SOCK_LOCK(so); 2206 so->so_linger = l.l_linger; 2207 if (l.l_onoff) 2208 so->so_options \|= SO_LINGER; 2209 else 2210 so->so_options &= ~SO_LINGER; 2211 SOCK_UNLOCK(so); 2212 break; 2213 2214 case SO_DEBUG: 2215 case SO_KEEPALIVE: 2216 case SO_DONTROUTE: 2217 case SO_USELOOPBACK: 2218 case SO_BROADCAST: 2219 case SO_REUSEADDR: 2220 case SO_REUSEPORT: 2221 case SO_OOBINLINE: 2222 case SO_TIMESTAMP: 2223 case SO_BINTIME: 2224 case SO_NOSIGPIPE: 2225 case SO_NO_DDP: 2226 case SO_NO_OFFLOAD: 2227 error = sooptcopyin(sopt, &optval, sizeof optval, 2228 sizeof optval); 2229 if (error) 2230 goto bad; 2231 SOCK_LOCK(so); 2232 if (optval) 2233 so->so_options \|= sopt->sopt_name; 2234 else 2235 so->so_options &= ~sopt->sopt_name; 2236 SOCK_UNLOCK(so); 2237 break; 2238 2239 case SO_SETFIB: 2240 error = sooptcopyin(sopt, &optval, sizeof optval, 2241 sizeof optval); 2242 if (optval < 1 \|\| optval > rt_numfibs) { 2243 error = EINVAL; 2244 goto bad; 2245 } 2246 if ((so->so_proto->pr_domain->dom_family == PF_INET) \|\| 2247 (so->so_proto->pr_domain->dom_family == PF_ROUTE)) { 2248 so->so_fibnum = optval; 2249 /* Note: ignore error / 2250* if (so->so_proto && so->so_proto->pr_ctloutput) 2251 (so->so_proto->pr_ctloutput)(so, sopt); 2252* } else { 2253 so->so_fibnum = 0; 2254 } 2255 break; 2256 case SO_SNDBUF: 2257 case SO_RCVBUF: 2258 case SO_SNDLOWAT: 2259 case SO_RCVLOWAT: 2260 error = sooptcopyin(sopt, &optval, sizeof optval, 2261 sizeof optval); 2262 if (error) 2263 goto bad; 2264 2265 /* 2266 * Values < 1 make no sense for any of these options, 2267 * so disallow them. 2268 / 2269* if (optval < 1) { 2270 error = EINVAL; 2271 goto bad; 2272 } 2273 2274 switch (sopt->sopt_name) { 2275 case SO_SNDBUF: 2276 case SO_RCVBUF: 2277 if (sbreserve(sopt->sopt_name == SO_SNDBUF ? 2278 &so->so_snd : &so->so_rcv, (u_long)optval, 2279 so, curthread) == 0) { 2280 error = ENOBUFS; 2281 goto bad; 2282 } 2283 (sopt->sopt_name == SO_SNDBUF ? &so->so_snd : 2284 &so->so_rcv)->sb_flags &= ~SB_AUTOSIZE; 2285 break; 2286 2287 /* 2288 * Make sure the low-water is never greater than the 2289 * high-water. 2290 / 2291* case SO_SNDLOWAT: 2292 SOCKBUF_LOCK(&so->so_snd); 2293 so->so_snd.sb_lowat = 2294 (optval > so->so_snd.sb_hiwat) ? 2295 so->so_snd.sb_hiwat : optval; 2296 SOCKBUF_UNLOCK(&so->so_snd); 2297 break; 2298 case SO_RCVLOWAT: 2299 SOCKBUF_LOCK(&so->so_rcv); 2300 so->so_rcv.sb_lowat = 2301 (optval > so->so_rcv.sb_hiwat) ? 2302 so->so_rcv.sb_hiwat : optval; 2303 SOCKBUF_UNLOCK(&so->so_rcv); 2304 break; 2305 } 2306 break; 2307 2308 case SO_SNDTIMEO: 2309 case SO_RCVTIMEO: 2310#ifdef COMPAT_IA32 2311 if (SV_CURPROC_FLAG(SV_ILP32)) { 2312 struct timeval32 tv32; 2313 2314 error = sooptcopyin(sopt, &tv32, sizeof tv32, 2315 sizeof tv32); 2316 CP(tv32, tv, tv_sec); 2317 CP(tv32, tv, tv_usec); 2318 } else 2319#endif 2320 error = sooptcopyin(sopt, &tv, sizeof tv, 2321 sizeof tv); 2322 if (error) 2323 goto bad; 2324 2325 /* assert(hz > 0); / 2326* if (tv.tv_sec < 0 \|\| tv.tv_sec > INT_MAX / hz \|\| 2327 tv.tv_usec < 0 \|\| tv.tv_usec >= 1000000) { 2328 error = EDOM; 2329 goto bad; 2330 } 2331 /* assert(tick > 0); / 2332* /* assert(ULONG_MAX - INT_MAX >= 1000000); / 2333* val = (u_long)(tv.tv_sec * hz) + tv.tv_usec / tick; 2334 if (val > INT_MAX) { 2335 error = EDOM; 2336 goto bad; 2337 } 2338 if (val == 0 && tv.tv_usec != 0) 2339 val = 1; 2340 2341 switch (sopt->sopt_name) { 2342 case SO_SNDTIMEO: 2343 so->so_snd.sb_timeo = val; 2344 break; 2345 case SO_RCVTIMEO: 2346 so->so_rcv.sb_timeo = val; 2347 break; 2348 } 2349 break; 2350 2351 case SO_LABEL: 2352#ifdef MAC 2353 error = sooptcopyin(sopt, &extmac, sizeof extmac, 2354 sizeof extmac); 2355 if (error) 2356 goto bad; 2357 error = mac_setsockopt_label(sopt->sopt_td->td_ucred, 2358 so, &extmac); 2359#else 2360 error = EOPNOTSUPP; 2361#endif 2362 break; 2363 2364 default: 2365 error = ENOPROTOOPT; 2366 break; 2367 } 2368 if (error == 0 && so->so_proto != NULL && 2369 so->so_proto->pr_ctloutput != NULL) { 2370 (void) ((so->so_proto->pr_ctloutput) 2371* (so, sopt)); 2372 } 2373 } 2374bad: 2375 return (error); 2376} 2377 2378/* 2379 * Helper routine for getsockopt. 2380 / 2381int 2382sooptcopyout(struct sockopt sopt, const void buf, size_t len) 2383{ 2384* int error; 2385 size_t valsize; 2386 2387 error = 0; 2388 2389 /* 2390 * Documented get behavior is that we always return a value, possibly 2391 * truncated to fit in the user's buffer. Traditional behavior is 2392 * that we always tell the user precisely how much we copied, rather 2393 * than something useful like the total amount we had available for 2394 * her. Note that this interface is not idempotent; the entire 2395 * answer must generated ahead of time. 2396 / 2397* valsize = min(len, sopt->sopt_valsize); 2398 sopt->sopt_valsize = valsize; 2399 if (sopt->sopt_val != NULL) { 2400 if (sopt->sopt_td != NULL) 2401 error = copyout(buf, sopt->sopt_val, valsize); 2402 else 2403 bcopy(buf, sopt->sopt_val, valsize); 2404 } 2405 return (error); 2406} 2407 2408int 2409sogetopt(struct socket so, struct sockopt sopt) 2410{ 2411 int error, optval; 2412 struct linger l; 2413 struct timeval tv; 2414#ifdef MAC 2415 struct mac extmac; 2416#endif 2417 2418 error = 0; 2419 if (sopt->sopt_level != SOL_SOCKET) { 2420 if (so->so_proto && so->so_proto->pr_ctloutput) { 2421 return ((so->so_proto->pr_ctloutput) 2422* (so, sopt)); 2423 } else 2424 return (ENOPROTOOPT); 2425 } else { 2426 switch (sopt->sopt_name) { 2427#ifdef INET 2428 case SO_ACCEPTFILTER: 2429 error = do_getopt_accept_filter(so, sopt); 2430 break; 2431#endif 2432 case SO_LINGER: 2433 SOCK_LOCK(so); 2434 l.l_onoff = so->so_options & SO_LINGER; 2435 l.l_linger = so->so_linger; 2436 SOCK_UNLOCK(so); 2437 error = sooptcopyout(sopt, &l, sizeof l); 2438 break; 2439 2440 case SO_USELOOPBACK: 2441 case SO_DONTROUTE: 2442 case SO_DEBUG: 2443 case SO_KEEPALIVE: 2444 case SO_REUSEADDR: 2445 case SO_REUSEPORT: 2446 case SO_BROADCAST: 2447 case SO_OOBINLINE: 2448 case SO_ACCEPTCONN: 2449 case SO_TIMESTAMP: 2450 case SO_BINTIME: 2451 case SO_NOSIGPIPE: 2452 optval = so->so_options & sopt->sopt_name; 2453integer: 2454 error = sooptcopyout(sopt, &optval, sizeof optval); 2455 break; 2456 2457 case SO_TYPE: 2458 optval = so->so_type; 2459 goto integer; 2460 2461 case SO_ERROR: 2462 SOCK_LOCK(so); 2463 optval = so->so_error; 2464 so->so_error = 0; 2465 SOCK_UNLOCK(so); 2466 goto integer; 2467 2468 case SO_SNDBUF: 2469 optval = so->so_snd.sb_hiwat; 2470 goto integer; 2471 2472 case SO_RCVBUF: 2473 optval = so->so_rcv.sb_hiwat; 2474 goto integer; 2475 2476 case SO_SNDLOWAT: 2477 optval = so->so_snd.sb_lowat; 2478 goto integer; 2479 2480 case SO_RCVLOWAT: 2481 optval = so->so_rcv.sb_lowat; 2482 goto integer; 2483 2484 case SO_SNDTIMEO: 2485 case SO_RCVTIMEO: 2486 optval = (sopt->sopt_name == SO_SNDTIMEO ? 2487 so->so_snd.sb_timeo : so->so_rcv.sb_timeo); 2488 2489 tv.tv_sec = optval / hz; 2490 tv.tv_usec = (optval % hz) * tick; 2491#ifdef COMPAT_IA32 2492 if (SV_CURPROC_FLAG(SV_ILP32)) { 2493 struct timeval32 tv32; 2494 2495 CP(tv, tv32, tv_sec); 2496 CP(tv, tv32, tv_usec); 2497 error = sooptcopyout(sopt, &tv32, sizeof tv32); 2498 } else 2499#endif 2500 error = sooptcopyout(sopt, &tv, sizeof tv); 2501 break; 2502 2503 case SO_LABEL: 2504#ifdef MAC 2505 error = sooptcopyin(sopt, &extmac, sizeof(extmac), 2506 sizeof(extmac)); 2507 if (error) 2508 return (error); 2509 error = mac_getsockopt_label(sopt->sopt_td->td_ucred, 2510 so, &extmac); 2511 if (error) 2512 return (error); 2513 error = sooptcopyout(sopt, &extmac, sizeof extmac); 2514#else 2515 error = EOPNOTSUPP; 2516#endif 2517 break; 2518 2519 case SO_PEERLABEL: 2520#ifdef MAC 2521 error = sooptcopyin(sopt, &extmac, sizeof(extmac), 2522 sizeof(extmac)); 2523 if (error) 2524 return (error); 2525 error = mac_getsockopt_peerlabel( 2526 sopt->sopt_td->td_ucred, so, &extmac); 2527 if (error) 2528 return (error); 2529 error = sooptcopyout(sopt, &extmac, sizeof extmac); 2530#else 2531 error = EOPNOTSUPP; 2532#endif 2533 break; 2534 2535 case SO_LISTENQLIMIT: 2536 optval = so->so_qlimit; 2537 goto integer; 2538 2539 case SO_LISTENQLEN: 2540 optval = so->so_qlen; 2541 goto integer; 2542 2543 case SO_LISTENINCQLEN: 2544 optval = so->so_incqlen; 2545 goto integer; 2546 2547 default: 2548 error = ENOPROTOOPT; 2549 break; 2550 } 2551 return (error); 2552 } 2553} 2554 2555/* XXX; prepare mbuf for (__FreeBSD__ < 3) routines. / 2556int 2557soopt_getm(struct sockopt sopt, struct mbuf *mp) 2558{ 2559* struct mbuf m, m_prev; 2560 int sopt_size = sopt->sopt_valsize; 2561 2562 MGET(m, sopt->sopt_td ? M_WAIT : M_DONTWAIT, MT_DATA); 2563 if (m == NULL) 2564 return ENOBUFS; 2565 if (sopt_size > MLEN) { 2566 MCLGET(m, sopt->sopt_td ? M_WAIT : M_DONTWAIT); 2567 if ((m->m_flags & M_EXT) == 0) { 2568 m_free(m); 2569 return ENOBUFS; 2570 } 2571 m->m_len = min(MCLBYTES, sopt_size); 2572 } else { 2573 m->m_len = min(MLEN, sopt_size); 2574 } 2575 sopt_size -= m->m_len; 2576 mp = m; 2577* m_prev = m; 2578 2579 while (sopt_size) { 2580 MGET(m, sopt->sopt_td ? M_WAIT : M_DONTWAIT, MT_DATA); 2581 if (m == NULL) { 2582 m_freem(mp); 2583* return ENOBUFS; 2584 } 2585 if (sopt_size > MLEN) { 2586 MCLGET(m, sopt->sopt_td != NULL ? M_WAIT : 2587 M_DONTWAIT); 2588 if ((m->m_flags & M_EXT) == 0) { 2589 m_freem(m); 2590 m_freem(mp); 2591* return ENOBUFS; 2592 } 2593 m->m_len = min(MCLBYTES, sopt_size); 2594 } else { 2595 m->m_len = min(MLEN, sopt_size); 2596 } 2597 sopt_size -= m->m_len; 2598 m_prev->m_next = m; 2599 m_prev = m; 2600 } 2601 return (0); 2602} 2603 2604/* XXX; copyin sopt data into mbuf chain for (__FreeBSD__ < 3) routines. / 2605int 2606soopt_mcopyin(struct sockopt sopt, struct mbuf m) 2607{ 2608* struct mbuf m0 = m; 2609* 2610 if (sopt->sopt_val == NULL) 2611 return (0); 2612 while (m != NULL && sopt->sopt_valsize >= m->m_len) { 2613 if (sopt->sopt_td != NULL) { 2614 int error; 2615 2616 error = copyin(sopt->sopt_val, mtod(m, char ), 2617* m->m_len); 2618 if (error != 0) { 2619 m_freem(m0); 2620 return(error); 2621 } 2622 } else 2623 bcopy(sopt->sopt_val, mtod(m, char ), m->m_len); 2624* sopt->sopt_valsize -= m->m_len; 2625 sopt->sopt_val = (char )sopt->sopt_val + m->m_len; 2626* m = m->m_next; 2627 } 2628 if (m != NULL) /* should be allocated enoughly at ip6_sooptmcopyin() / 2629* panic("ip6_sooptmcopyin"); 2630 return (0); 2631} 2632 2633/* XXX; copyout mbuf chain data into soopt for (__FreeBSD__ < 3) routines. / 2634int 2635soopt_mcopyout(struct sockopt sopt, struct mbuf m) 2636{ 2637* struct mbuf m0 = m; 2638* size_t valsize = 0; 2639 2640 if (sopt->sopt_val == NULL) 2641 return (0); 2642 while (m != NULL && sopt->sopt_valsize >= m->m_len) { 2643 if (sopt->sopt_td != NULL) { 2644 int error; 2645 2646 error = copyout(mtod(m, char ), sopt->sopt_val, 2647* m->m_len); 2648 if (error != 0) { 2649 m_freem(m0); 2650 return(error); 2651 } 2652 } else 2653 bcopy(mtod(m, char ), sopt->sopt_val, m->m_len); 2654* sopt->sopt_valsize -= m->m_len; 2655 sopt->sopt_val = (char )sopt->sopt_val + m->m_len; 2656* valsize += m->m_len; 2657 m = m->m_next; 2658 } 2659 if (m != NULL) { 2660 /* enough soopt buffer should be given from user-land / 2661* m_freem(m0); 2662 return(EINVAL); 2663 } 2664 sopt->sopt_valsize = valsize; 2665 return (0); 2666} 2667 2668/* 2669 * sohasoutofband(): protocol notifies socket layer of the arrival of new 2670 * out-of-band data, which will then notify socket consumers. 2671 / 2672void 2673sohasoutofband(struct socket so) 2674{ 2675 2676 if (so->so_sigio != NULL) 2677 pgsigio(&so->so_sigio, SIGURG, 0); 2678 selwakeuppri(&so->so_rcv.sb_sel, PSOCK); 2679} 2680 2681int 2682sopoll(struct socket so, int events, struct ucred active_cred, 2683 struct thread td) 2684{ 2685* 2686 return (so->so_proto->pr_usrreqs->pru_sopoll(so, events, active_cred, 2687 td)); 2688} 2689 2690int 2691sopoll_generic(struct socket so, int events, struct ucred active_cred, 2692 struct thread td) 2693{ 2694* int revents = 0; 2695 2696 SOCKBUF_LOCK(&so->so_snd); 2697 SOCKBUF_LOCK(&so->so_rcv); 2698 if (events & (POLLIN \| POLLRDNORM)) 2699 if (soreadable(so)) 2700 revents \|= events & (POLLIN \| POLLRDNORM); 2701 2702 if (events & POLLINIGNEOF) 2703 if (so->so_rcv.sb_cc >= so->so_rcv.sb_lowat \|\| 2704 !TAILQ_EMPTY(&so->so_comp) \|\| so->so_error) 2705 revents \|= POLLINIGNEOF; 2706 2707 if (events & (POLLOUT \| POLLWRNORM)) 2708 if (sowriteable(so)) 2709 revents \|= events & (POLLOUT \| POLLWRNORM); 2710 2711 if (events & (POLLPRI \| POLLRDBAND)) 2712 if (so->so_oobmark \|\| (so->so_rcv.sb_state & SBS_RCVATMARK)) 2713 revents \|= events & (POLLPRI \| POLLRDBAND); 2714 2715 if (revents == 0) { 2716 if (events & 2717 (POLLIN \| POLLINIGNEOF \| POLLPRI \| POLLRDNORM \| 2718 POLLRDBAND)) { 2719 selrecord(td, &so->so_rcv.sb_sel); 2720 so->so_rcv.sb_flags \|= SB_SEL; 2721 } 2722 2723 if (events & (POLLOUT \| POLLWRNORM)) { 2724 selrecord(td, &so->so_snd.sb_sel); 2725 so->so_snd.sb_flags \|= SB_SEL; 2726 } 2727 } 2728 2729 SOCKBUF_UNLOCK(&so->so_rcv); 2730 SOCKBUF_UNLOCK(&so->so_snd); 2731 return (revents); 2732} 2733 2734int 2735soo_kqfilter(struct file fp, struct knote kn) 2736{ 2737 struct socket so = kn->kn_fp->f_data; 2738* struct sockbuf sb; 2739* 2740 switch (kn->kn_filter) { 2741 case EVFILT_READ: 2742 if (so->so_options & SO_ACCEPTCONN) 2743 kn->kn_fop = &solisten_filtops; 2744 else 2745 kn->kn_fop = &soread_filtops; 2746 sb = &so->so_rcv; 2747 break; 2748 case EVFILT_WRITE: 2749 kn->kn_fop = &sowrite_filtops; 2750 sb = &so->so_snd; 2751 break; 2752 default: 2753 return (EINVAL); 2754 } 2755 2756 SOCKBUF_LOCK(sb); 2757 knlist_add(&sb->sb_sel.si_note, kn, 1); 2758 sb->sb_flags \|= SB_KNOTE; 2759 SOCKBUF_UNLOCK(sb); 2760 return (0); 2761} 2762 2763/* 2764 * Some routines that return EOPNOTSUPP for entry points that are not 2765 * supported by a protocol. Fill in as needed. 2766 / 2767int 2768pru_accept_notsupp(struct socket so, struct sockaddr *nam) 2769{ 2770* 2771 return EOPNOTSUPP; 2772} 2773 2774int 2775pru_attach_notsupp(struct socket so, int proto, struct thread td) 2776{ 2777 2778 return EOPNOTSUPP; 2779} 2780 2781int 2782pru_bind_notsupp(struct socket so, struct sockaddr nam, struct thread td) 2783{ 2784* 2785 return EOPNOTSUPP; 2786} 2787 2788int 2789pru_connect_notsupp(struct socket so, struct sockaddr nam, struct thread td) 2790{ 2791* 2792 return EOPNOTSUPP; 2793} 2794 2795int 2796pru_connect2_notsupp(struct socket so1, struct socket so2) 2797{ 2798 2799 return EOPNOTSUPP; 2800} 2801 2802int 2803pru_control_notsupp(struct socket so, u_long cmd, caddr_t data, 2804* struct ifnet ifp, struct thread td) 2805{ 2806 2807 return EOPNOTSUPP; 2808} 2809 2810int 2811pru_disconnect_notsupp(struct socket so) 2812{ 2813* 2814 return EOPNOTSUPP; 2815} 2816 2817int 2818pru_listen_notsupp(struct socket so, int backlog, struct thread td) 2819{ 2820 2821 return EOPNOTSUPP; 2822} 2823 2824int 2825pru_peeraddr_notsupp(struct socket so, struct sockaddr nam) 2826{ 2827* 2828 return EOPNOTSUPP; 2829} 2830 2831int 2832pru_rcvd_notsupp(struct socket so, int flags) 2833{ 2834* 2835 return EOPNOTSUPP; 2836} 2837 2838int 2839pru_rcvoob_notsupp(struct socket so, struct mbuf m, int flags) 2840{ 2841 2842 return EOPNOTSUPP; 2843} 2844 2845int 2846pru_send_notsupp(struct socket so, int flags, struct mbuf m, 2847 struct sockaddr addr, struct mbuf control, struct thread td) 2848{ 2849* 2850 return EOPNOTSUPP; 2851} 2852 2853/* 2854 * This isn't really a ``null'' operation, but it's the default one and 2855 * doesn't do anything destructive. 2856 / 2857int 2858pru_sense_null(struct socket so, struct stat sb) 2859{ 2860* 2861 sb->st_blksize = so->so_snd.sb_hiwat; 2862 return 0; 2863} 2864 2865int 2866pru_shutdown_notsupp(struct socket so) 2867{ 2868* 2869 return EOPNOTSUPP; 2870} 2871 2872int 2873pru_sockaddr_notsupp(struct socket so, struct sockaddr nam) 2874{ 2875* 2876 return EOPNOTSUPP; 2877} 2878 2879int 2880pru_sosend_notsupp(struct socket so, struct sockaddr addr, struct uio uio, 2881* struct mbuf top, struct mbuf control, int flags, struct thread td) 2882{ 2883* 2884 return EOPNOTSUPP; 2885} 2886 2887int 2888pru_soreceive_notsupp(struct socket so, struct sockaddr paddr, 2889* struct uio uio, struct mbuf mp0, struct mbuf controlp, int flagsp) 2890{ 2891 2892 return EOPNOTSUPP; 2893} 2894 2895int 2896pru_sopoll_notsupp(struct socket so, int events, struct ucred cred, 2897 struct thread td) 2898{ 2899* 2900 return EOPNOTSUPP; 2901} 2902 2903static void 2904filt_sordetach(struct knote kn) 2905{ 2906* struct socket so = kn->kn_fp->f_data; 2907* 2908 SOCKBUF_LOCK(&so->so_rcv); 2909 knlist_remove(&so->so_rcv.sb_sel.si_note, kn, 1); 2910 if (knlist_empty(&so->so_rcv.sb_sel.si_note)) 2911 so->so_rcv.sb_flags &= ~SB_KNOTE; 2912 SOCKBUF_UNLOCK(&so->so_rcv); 2913} 2914 2915/ARGSUSED/ 2916static int 2917filt_soread(struct knote kn, long hint) 2918{ 2919* struct socket so; 2920* 2921 so = kn->kn_fp->f_data; 2922 SOCKBUF_LOCK_ASSERT(&so->so_rcv); 2923 2924 kn->kn_data = so->so_rcv.sb_cc - so->so_rcv.sb_ctl; 2925 if (so->so_rcv.sb_state & SBS_CANTRCVMORE) { 2926 kn->kn_flags \|= EV_EOF; 2927 kn->kn_fflags = so->so_error; 2928 return (1); 2929 } else if (so->so_error) /* temporary udp error / 2930* return (1); 2931 else if (kn->kn_sfflags & NOTE_LOWAT) 2932 return (kn->kn_data >= kn->kn_sdata); 2933 else 2934 return (so->so_rcv.sb_cc >= so->so_rcv.sb_lowat); 2935} 2936 2937static void 2938filt_sowdetach(struct knote kn) 2939{ 2940* struct socket so = kn->kn_fp->f_data; 2941* 2942 SOCKBUF_LOCK(&so->so_snd); 2943 knlist_remove(&so->so_snd.sb_sel.si_note, kn, 1); 2944 if (knlist_empty(&so->so_snd.sb_sel.si_note)) 2945 so->so_snd.sb_flags &= ~SB_KNOTE; 2946 SOCKBUF_UNLOCK(&so->so_snd); 2947} 2948 2949/ARGSUSED/ 2950static int 2951filt_sowrite(struct knote kn, long hint) 2952{ 2953* struct socket so; 2954* 2955 so = kn->kn_fp->f_data; 2956 SOCKBUF_LOCK_ASSERT(&so->so_snd); 2957 kn->kn_data = sbspace(&so->so_snd); 2958 if (so->so_snd.sb_state & SBS_CANTSENDMORE) { 2959 kn->kn_flags \|= EV_EOF; 2960 kn->kn_fflags = so->so_error; 2961 return (1); 2962 } else if (so->so_error) /* temporary udp error / 2963* return (1); 2964 else if (((so->so_state & SS_ISCONNECTED) == 0) && 2965 (so->so_proto->pr_flags & PR_CONNREQUIRED)) 2966 return (0); 2967 else if (kn->kn_sfflags & NOTE_LOWAT) 2968 return (kn->kn_data >= kn->kn_sdata); 2969 else 2970 return (kn->kn_data >= so->so_snd.sb_lowat); 2971} 2972 2973/ARGSUSED/ 2974static int 2975filt_solisten(struct knote kn, long hint) 2976{ 2977* struct socket so = kn->kn_fp->f_data; 2978* 2979 kn->kn_data = so->so_qlen; 2980 return (! TAILQ_EMPTY(&so->so_comp)); 2981} 2982 2983int 2984socheckuid(struct socket so, uid_t uid) 2985{ 2986* 2987 if (so == NULL) 2988 return (EPERM); 2989 if (so->so_cred->cr_uid != uid) 2990 return (EPERM); 2991 return (0); 2992} 2993 2994static int 2995sysctl_somaxconn(SYSCTL_HANDLER_ARGS) 2996{ 2997 int error; 2998 int val; 2999 3000 val = somaxconn; 3001 error = sysctl_handle_int(oidp, &val, 0, req); 3002 if (error \|\| !req->newptr ) 3003 return (error); 3004 3005 if (val < 1 \|\| val > USHRT_MAX) 3006 return (EINVAL); 3007 3008 somaxconn = val; 3009 return (0); 3010} 3011 3012/* 3013 * These functions are used by protocols to notify the socket layer (and its 3014 * consumers) of state changes in the sockets driven by protocol-side events. 3015 / 3016* 3017/* 3018 * Procedures to manipulate state flags of socket and do appropriate wakeups. 3019 * 3020 * Normal sequence from the active (originating) side is that 3021 * soisconnecting() is called during processing of connect() call, resulting 3022 * in an eventual call to soisconnected() if/when the connection is 3023 * established. When the connection is torn down soisdisconnecting() is 3024 * called during processing of disconnect() call, and soisdisconnected() is 3025 * called when the connection to the peer is totally severed. The semantics 3026 * of these routines are such that connectionless protocols can call 3027 * soisconnected() and soisdisconnected() only, bypassing the in-progress 3028 * calls when setting up a ``connection'' takes no time. 3029 * 3030 * From the passive side, a socket is created with two queues of sockets: 3031 * so_incomp for connections in progress and so_comp for connections already 3032 * made and awaiting user acceptance. As a protocol is preparing incoming 3033 * connections, it creates a socket structure queued on so_incomp by calling 3034 * sonewconn(). When the connection is established, soisconnected() is 3035 * called, and transfers the socket structure to so_comp, making it available 3036 * to accept(). 3037 * 3038 * If a socket is closed with sockets on either so_incomp or so_comp, these 3039 * sockets are dropped. 3040 * 3041 * If higher-level protocols are implemented in the kernel, the wakeups done 3042 * here will sometimes cause software-interrupt process scheduling. 3043 / 3044void 3045soisconnecting(struct socket so) 3046{ 3047 3048 SOCK_LOCK(so); 3049 so->so_state &= ~(SS_ISCONNECTED\|SS_ISDISCONNECTING); 3050 so->so_state \|= SS_ISCONNECTING; 3051 SOCK_UNLOCK(so); 3052} 3053 3054void 3055soisconnected(struct socket so) 3056*{	99 100#include "opt_inet.h" 101#include "opt_inet6.h" 102#include "opt_mac.h" 103#include "opt_zero.h" 104#include "opt_compat.h" 105 106#include <sys/param.h> 107#include <sys/systm.h> 108#include <sys/fcntl.h> 109#include <sys/limits.h> 110#include <sys/lock.h> 111#include <sys/mac.h> 112#include <sys/malloc.h> 113#include <sys/mbuf.h> 114#include <sys/mutex.h> 115#include <sys/domain.h> 116#include <sys/file.h> /* for struct knote / 117#include <sys/kernel.h> 118#include <sys/event.h> 119#include <sys/eventhandler.h> 120#include <sys/poll.h> 121#include <sys/proc.h> 122#include <sys/protosw.h> 123#include <sys/socket.h> 124#include <sys/socketvar.h> 125#include <sys/resourcevar.h> 126#include <net/route.h> 127#include <sys/signalvar.h> 128#include <sys/stat.h> 129#include <sys/sx.h> 130#include <sys/sysctl.h> 131#include <sys/uio.h> 132#include <sys/jail.h> 133#include <sys/vimage.h> 134* 135#include <security/mac/mac_framework.h> 136 137#include <vm/uma.h> 138 139#ifdef COMPAT_IA32 140#include <sys/mount.h> 141#include <sys/sysent.h> 142#include <compat/freebsd32/freebsd32.h> 143#endif 144 145static int soreceive_rcvoob(struct socket so, struct uio uio, 146 int flags); 147 148static void filt_sordetach(struct knote kn); 149static int filt_soread(struct knote kn, long hint); 150static void filt_sowdetach(struct knote kn); 151static int filt_sowrite(struct knote kn, long hint); 152static int filt_solisten(struct knote kn, long hint); 153* 154static struct filterops solisten_filtops = 155 { 1, NULL, filt_sordetach, filt_solisten }; 156static struct filterops soread_filtops = 157 { 1, NULL, filt_sordetach, filt_soread }; 158static struct filterops sowrite_filtops = 159 { 1, NULL, filt_sowdetach, filt_sowrite }; 160 161uma_zone_t socket_zone; 162so_gen_t so_gencnt; /* generation count for sockets / 163* 164int maxsockets; 165 166MALLOC_DEFINE(M_SONAME, "soname", "socket name"); 167MALLOC_DEFINE(M_PCB, "pcb", "protocol control block"); 168 169static int somaxconn = SOMAXCONN; 170static int sysctl_somaxconn(SYSCTL_HANDLER_ARGS); 171/* XXX: we dont have SYSCTL_USHORT / 172SYSCTL_PROC(_kern_ipc, KIPC_SOMAXCONN, somaxconn, CTLTYPE_UINT \| CTLFLAG_RW, 173* 0, sizeof(int), sysctl_somaxconn, "I", "Maximum pending socket connection " 174 "queue size"); 175static int numopensockets; 176SYSCTL_INT(_kern_ipc, OID_AUTO, numopensockets, CTLFLAG_RD, 177 &numopensockets, 0, "Number of open sockets"); 178#ifdef ZERO_COPY_SOCKETS 179/* These aren't static because they're used in other files. / 180int so_zero_copy_send = 1; 181int so_zero_copy_receive = 1; 182SYSCTL_NODE(_kern_ipc, OID_AUTO, zero_copy, CTLFLAG_RD, 0, 183* "Zero copy controls"); 184SYSCTL_INT(_kern_ipc_zero_copy, OID_AUTO, receive, CTLFLAG_RW, 185 &so_zero_copy_receive, 0, "Enable zero copy receive"); 186SYSCTL_INT(_kern_ipc_zero_copy, OID_AUTO, send, CTLFLAG_RW, 187 &so_zero_copy_send, 0, "Enable zero copy send"); 188#endif /* ZERO_COPY_SOCKETS / 189* 190/* 191 * accept_mtx locks down per-socket fields relating to accept queues. See 192 * socketvar.h for an annotation of the protected fields of struct socket. 193 / 194struct mtx accept_mtx; 195MTX_SYSINIT(accept_mtx, &accept_mtx, "accept", MTX_DEF); 196* 197/* 198 * so_global_mtx protects so_gencnt, numopensockets, and the per-socket 199 * so_gencnt field. 200 / 201static struct mtx so_global_mtx; 202MTX_SYSINIT(so_global_mtx, &so_global_mtx, "so_glabel", MTX_DEF); 203* 204/* 205 * General IPC sysctl name space, used by sockets and a variety of other IPC 206 * types. 207 / 208SYSCTL_NODE(_kern, KERN_IPC, ipc, CTLFLAG_RW, 0, "IPC"); 209* 210/* 211 * Sysctl to get and set the maximum global sockets limit. Notify protocols 212 * of the change so that they can update their dependent limits as required. 213 / 214static int 215sysctl_maxsockets(SYSCTL_HANDLER_ARGS) 216{ 217* int error, newmaxsockets; 218 219 newmaxsockets = maxsockets; 220 error = sysctl_handle_int(oidp, &newmaxsockets, 0, req); 221 if (error == 0 && req->newptr) { 222 if (newmaxsockets > maxsockets) { 223 maxsockets = newmaxsockets; 224 if (maxsockets > ((maxfiles / 4) * 3)) { 225 maxfiles = (maxsockets * 5) / 4; 226 maxfilesperproc = (maxfiles * 9) / 10; 227 } 228 EVENTHANDLER_INVOKE(maxsockets_change); 229 } else 230 error = EINVAL; 231 } 232 return (error); 233} 234 235SYSCTL_PROC(_kern_ipc, OID_AUTO, maxsockets, CTLTYPE_INT\|CTLFLAG_RW, 236 &maxsockets, 0, sysctl_maxsockets, "IU", 237 "Maximum number of sockets avaliable"); 238 239/* 240 * Initialise maxsockets. This SYSINIT must be run after 241 * tunable_mbinit(). 242 / 243static void 244init_maxsockets(void ignored) 245{ 246 247 TUNABLE_INT_FETCH("kern.ipc.maxsockets", &maxsockets); 248 maxsockets = imax(maxsockets, imax(maxfiles, nmbclusters)); 249} 250SYSINIT(param, SI_SUB_TUNABLES, SI_ORDER_ANY, init_maxsockets, NULL); 251 252/* 253 * Socket operation routines. These routines are called by the routines in 254 * sys_socket.c or from a system process, and implement the semantics of 255 * socket operations by switching out to the protocol specific routines. 256 / 257* 258/* 259 * Get a socket structure from our zone, and initialize it. Note that it 260 * would probably be better to allocate socket and PCB at the same time, but 261 * I'm not convinced that all the protocols can be easily modified to do 262 * this. 263 * 264 * soalloc() returns a socket with a ref count of 0. 265 / 266static struct socket 267soalloc(struct vnet vnet) 268{ 269* struct socket so; 270* 271 so = uma_zalloc(socket_zone, M_NOWAIT \| M_ZERO); 272 if (so == NULL) 273 return (NULL); 274#ifdef MAC 275 if (mac_socket_init(so, M_NOWAIT) != 0) { 276 uma_zfree(socket_zone, so); 277 return (NULL); 278 } 279#endif 280 SOCKBUF_LOCK_INIT(&so->so_snd, "so_snd"); 281 SOCKBUF_LOCK_INIT(&so->so_rcv, "so_rcv"); 282 sx_init(&so->so_snd.sb_sx, "so_snd_sx"); 283 sx_init(&so->so_rcv.sb_sx, "so_rcv_sx"); 284 TAILQ_INIT(&so->so_aiojobq); 285 mtx_lock(&so_global_mtx); 286 so->so_gencnt = ++so_gencnt; 287 ++numopensockets; 288#ifdef VIMAGE 289 ++vnet->sockcnt; /* Locked with so_global_mtx. / 290* so->so_vnet = vnet; 291#endif 292 mtx_unlock(&so_global_mtx); 293 return (so); 294} 295 296/* 297 * Free the storage associated with a socket at the socket layer, tear down 298 * locks, labels, etc. All protocol state is assumed already to have been 299 * torn down (and possibly never set up) by the caller. 300 / 301static void 302sodealloc(struct socket so) 303{ 304 305 KASSERT(so->so_count == 0, ("sodealloc(): so_count %d", so->so_count)); 306 KASSERT(so->so_pcb == NULL, ("sodealloc(): so_pcb != NULL")); 307 308 mtx_lock(&so_global_mtx); 309 so->so_gencnt = ++so_gencnt; 310 --numopensockets; /* Could be below, but faster here. / 311#ifdef VIMAGE 312* --so->so_vnet->sockcnt; 313#endif 314 mtx_unlock(&so_global_mtx); 315 if (so->so_rcv.sb_hiwat) 316 (void)chgsbsize(so->so_cred->cr_uidinfo, 317 &so->so_rcv.sb_hiwat, 0, RLIM_INFINITY); 318 if (so->so_snd.sb_hiwat) 319 (void)chgsbsize(so->so_cred->cr_uidinfo, 320 &so->so_snd.sb_hiwat, 0, RLIM_INFINITY); 321#ifdef INET 322 /* remove acccept filter if one is present. / 323* if (so->so_accf != NULL) 324 do_setopt_accept_filter(so, NULL); 325#endif 326#ifdef MAC 327 mac_socket_destroy(so); 328#endif 329 crfree(so->so_cred); 330 sx_destroy(&so->so_snd.sb_sx); 331 sx_destroy(&so->so_rcv.sb_sx); 332 SOCKBUF_LOCK_DESTROY(&so->so_snd); 333 SOCKBUF_LOCK_DESTROY(&so->so_rcv); 334 uma_zfree(socket_zone, so); 335} 336 337/* 338 * socreate returns a socket with a ref count of 1. The socket should be 339 * closed with soclose(). 340 / 341int 342socreate(int dom, struct socket aso, int type, int proto, 343* struct ucred cred, struct thread td) 344{ 345 struct protosw prp; 346* struct socket so; 347* int error; 348 349 if (proto) 350 prp = pffindproto(dom, proto, type); 351 else 352 prp = pffindtype(dom, type); 353 354 if (prp == NULL \|\| prp->pr_usrreqs->pru_attach == NULL \|\| 355 prp->pr_usrreqs->pru_attach == pru_attach_notsupp) 356 return (EPROTONOSUPPORT); 357 358 if (prison_check_af(cred, prp->pr_domain->dom_family) != 0) 359 return (EPROTONOSUPPORT); 360 361 if (prp->pr_type != type) 362 return (EPROTOTYPE); 363 so = soalloc(TD_TO_VNET(td)); 364 if (so == NULL) 365 return (ENOBUFS); 366 367 TAILQ_INIT(&so->so_incomp); 368 TAILQ_INIT(&so->so_comp); 369 so->so_type = type; 370 so->so_cred = crhold(cred); 371 if ((prp->pr_domain->dom_family == PF_INET) \|\| 372 (prp->pr_domain->dom_family == PF_ROUTE)) 373 so->so_fibnum = td->td_proc->p_fibnum; 374 else 375 so->so_fibnum = 0; 376 so->so_proto = prp; 377#ifdef MAC 378 mac_socket_create(cred, so); 379#endif 380 knlist_init(&so->so_rcv.sb_sel.si_note, SOCKBUF_MTX(&so->so_rcv), 381 NULL, NULL, NULL); 382 knlist_init(&so->so_snd.sb_sel.si_note, SOCKBUF_MTX(&so->so_snd), 383 NULL, NULL, NULL); 384 so->so_count = 1; 385 /* 386 * Auto-sizing of socket buffers is managed by the protocols and 387 * the appropriate flags must be set in the pru_attach function. 388 / 389* CURVNET_SET(so->so_vnet); 390 error = (prp->pr_usrreqs->pru_attach)(so, proto, td); 391* CURVNET_RESTORE(); 392 if (error) { 393 KASSERT(so->so_count == 1, ("socreate: so_count %d", 394 so->so_count)); 395 so->so_count = 0; 396 sodealloc(so); 397 return (error); 398 } 399 aso = so; 400* return (0); 401} 402 403#ifdef REGRESSION 404static int regression_sonewconn_earlytest = 1; 405SYSCTL_INT(_regression, OID_AUTO, sonewconn_earlytest, CTLFLAG_RW, 406 &regression_sonewconn_earlytest, 0, "Perform early sonewconn limit test"); 407#endif 408 409/* 410 * When an attempt at a new connection is noted on a socket which accepts 411 * connections, sonewconn is called. If the connection is possible (subject 412 * to space constraints, etc.) then we allocate a new structure, propoerly 413 * linked into the data structure of the original socket, and return this. 414 * Connstatus may be 0, or SO_ISCONFIRMING, or SO_ISCONNECTED. 415 * 416 * Note: the ref count on the socket is 0 on return. 417 / 418struct socket 419sonewconn(struct socket head, int connstatus) 420{ 421* struct socket so; 422* int over; 423 424 ACCEPT_LOCK(); 425 over = (head->so_qlen > 3 * head->so_qlimit / 2); 426 ACCEPT_UNLOCK(); 427#ifdef REGRESSION 428 if (regression_sonewconn_earlytest && over) 429#else 430 if (over) 431#endif 432 return (NULL); 433 VNET_ASSERT(head->so_vnet); 434 so = soalloc(head->so_vnet); 435 if (so == NULL) 436 return (NULL); 437 if ((head->so_options & SO_ACCEPTFILTER) != 0) 438 connstatus = 0; 439 so->so_head = head; 440 so->so_type = head->so_type; 441 so->so_options = head->so_options &~ SO_ACCEPTCONN; 442 so->so_linger = head->so_linger; 443 so->so_state = head->so_state \| SS_NOFDREF; 444 so->so_proto = head->so_proto; 445 so->so_cred = crhold(head->so_cred); 446#ifdef MAC 447 SOCK_LOCK(head); 448 mac_socket_newconn(head, so); 449 SOCK_UNLOCK(head); 450#endif 451 knlist_init(&so->so_rcv.sb_sel.si_note, SOCKBUF_MTX(&so->so_rcv), 452 NULL, NULL, NULL); 453 knlist_init(&so->so_snd.sb_sel.si_note, SOCKBUF_MTX(&so->so_snd), 454 NULL, NULL, NULL); 455 if (soreserve(so, head->so_snd.sb_hiwat, head->so_rcv.sb_hiwat) \|\| 456 (so->so_proto->pr_usrreqs->pru_attach)(so, 0, NULL)) { 457* sodealloc(so); 458 return (NULL); 459 } 460 so->so_rcv.sb_lowat = head->so_rcv.sb_lowat; 461 so->so_snd.sb_lowat = head->so_snd.sb_lowat; 462 so->so_rcv.sb_timeo = head->so_rcv.sb_timeo; 463 so->so_snd.sb_timeo = head->so_snd.sb_timeo; 464 so->so_rcv.sb_flags \|= head->so_rcv.sb_flags & SB_AUTOSIZE; 465 so->so_snd.sb_flags \|= head->so_snd.sb_flags & SB_AUTOSIZE; 466 so->so_state \|= connstatus; 467 ACCEPT_LOCK(); 468 if (connstatus) { 469 TAILQ_INSERT_TAIL(&head->so_comp, so, so_list); 470 so->so_qstate \|= SQ_COMP; 471 head->so_qlen++; 472 } else { 473 /* 474 * Keep removing sockets from the head until there's room for 475 * us to insert on the tail. In pre-locking revisions, this 476 * was a simple if(), but as we could be racing with other 477 * threads and soabort() requires dropping locks, we must 478 * loop waiting for the condition to be true. 479 / 480* while (head->so_incqlen > head->so_qlimit) { 481 struct socket sp; 482* sp = TAILQ_FIRST(&head->so_incomp); 483 TAILQ_REMOVE(&head->so_incomp, sp, so_list); 484 head->so_incqlen--; 485 sp->so_qstate &= ~SQ_INCOMP; 486 sp->so_head = NULL; 487 ACCEPT_UNLOCK(); 488 soabort(sp); 489 ACCEPT_LOCK(); 490 } 491 TAILQ_INSERT_TAIL(&head->so_incomp, so, so_list); 492 so->so_qstate \|= SQ_INCOMP; 493 head->so_incqlen++; 494 } 495 ACCEPT_UNLOCK(); 496 if (connstatus) { 497 sorwakeup(head); 498 wakeup_one(&head->so_timeo); 499 } 500 return (so); 501} 502 503int 504sobind(struct socket so, struct sockaddr nam, struct thread td) 505{ 506* int error; 507 508 CURVNET_SET(so->so_vnet); 509 error = (so->so_proto->pr_usrreqs->pru_bind)(so, nam, td); 510* CURVNET_RESTORE(); 511 return error; 512} 513 514/* 515 * solisten() transitions a socket from a non-listening state to a listening 516 * state, but can also be used to update the listen queue depth on an 517 * existing listen socket. The protocol will call back into the sockets 518 * layer using solisten_proto_check() and solisten_proto() to check and set 519 * socket-layer listen state. Call backs are used so that the protocol can 520 * acquire both protocol and socket layer locks in whatever order is required 521 * by the protocol. 522 * 523 * Protocol implementors are advised to hold the socket lock across the 524 * socket-layer test and set to avoid races at the socket layer. 525 / 526int 527solisten(struct socket so, int backlog, struct thread td) 528{ 529* 530 return ((so->so_proto->pr_usrreqs->pru_listen)(so, backlog, td)); 531} 532* 533int 534solisten_proto_check(struct socket so) 535{ 536* 537 SOCK_LOCK_ASSERT(so); 538 539 if (so->so_state & (SS_ISCONNECTED \| SS_ISCONNECTING \| 540 SS_ISDISCONNECTING)) 541 return (EINVAL); 542 return (0); 543} 544 545void 546solisten_proto(struct socket so, int backlog) 547{ 548* 549 SOCK_LOCK_ASSERT(so); 550 551 if (backlog < 0 \|\| backlog > somaxconn) 552 backlog = somaxconn; 553 so->so_qlimit = backlog; 554 so->so_options \|= SO_ACCEPTCONN; 555} 556 557/* 558 * Attempt to free a socket. This should really be sotryfree(). 559 * 560 * sofree() will succeed if: 561 * 562 * - There are no outstanding file descriptor references or related consumers 563 * (so_count == 0). 564 * 565 * - The socket has been closed by user space, if ever open (SS_NOFDREF). 566 * 567 * - The protocol does not have an outstanding strong reference on the socket 568 * (SS_PROTOREF). 569 * 570 * - The socket is not in a completed connection queue, so a process has been 571 * notified that it is present. If it is removed, the user process may 572 * block in accept() despite select() saying the socket was ready. 573 * 574 * Otherwise, it will quietly abort so that a future call to sofree(), when 575 * conditions are right, can succeed. 576 / 577void 578sofree(struct socket so) 579{ 580 struct protosw pr = so->so_proto; 581* struct socket head; 582* 583 ACCEPT_LOCK_ASSERT(); 584 SOCK_LOCK_ASSERT(so); 585 586 if ((so->so_state & SS_NOFDREF) == 0 \|\| so->so_count != 0 \|\| 587 (so->so_state & SS_PROTOREF) \|\| (so->so_qstate & SQ_COMP)) { 588 SOCK_UNLOCK(so); 589 ACCEPT_UNLOCK(); 590 return; 591 } 592 593 head = so->so_head; 594 if (head != NULL) { 595 KASSERT((so->so_qstate & SQ_COMP) != 0 \|\| 596 (so->so_qstate & SQ_INCOMP) != 0, 597 ("sofree: so_head != NULL, but neither SQ_COMP nor " 598 "SQ_INCOMP")); 599 KASSERT((so->so_qstate & SQ_COMP) == 0 \|\| 600 (so->so_qstate & SQ_INCOMP) == 0, 601 ("sofree: so->so_qstate is SQ_COMP and also SQ_INCOMP")); 602 TAILQ_REMOVE(&head->so_incomp, so, so_list); 603 head->so_incqlen--; 604 so->so_qstate &= ~SQ_INCOMP; 605 so->so_head = NULL; 606 } 607 KASSERT((so->so_qstate & SQ_COMP) == 0 && 608 (so->so_qstate & SQ_INCOMP) == 0, 609 ("sofree: so_head == NULL, but still SQ_COMP(%d) or SQ_INCOMP(%d)", 610 so->so_qstate & SQ_COMP, so->so_qstate & SQ_INCOMP)); 611 if (so->so_options & SO_ACCEPTCONN) { 612 KASSERT((TAILQ_EMPTY(&so->so_comp)), ("sofree: so_comp populated")); 613 KASSERT((TAILQ_EMPTY(&so->so_incomp)), ("sofree: so_comp populated")); 614 } 615 SOCK_UNLOCK(so); 616 ACCEPT_UNLOCK(); 617 618 if (pr->pr_flags & PR_RIGHTS && pr->pr_domain->dom_dispose != NULL) 619 (pr->pr_domain->dom_dispose)(so->so_rcv.sb_mb); 620* if (pr->pr_usrreqs->pru_detach != NULL) 621 (pr->pr_usrreqs->pru_detach)(so); 622* 623 /* 624 * From this point on, we assume that no other references to this 625 * socket exist anywhere else in the stack. Therefore, no locks need 626 * to be acquired or held. 627 * 628 * We used to do a lot of socket buffer and socket locking here, as 629 * well as invoke sorflush() and perform wakeups. The direct call to 630 * dom_dispose() and sbrelease_internal() are an inlining of what was 631 * necessary from sorflush(). 632 * 633 * Notice that the socket buffer and kqueue state are torn down 634 * before calling pru_detach. This means that protocols shold not 635 * assume they can perform socket wakeups, etc, in their detach code. 636 / 637* sbdestroy(&so->so_snd, so); 638 sbdestroy(&so->so_rcv, so); 639 knlist_destroy(&so->so_rcv.sb_sel.si_note); 640 knlist_destroy(&so->so_snd.sb_sel.si_note); 641 sodealloc(so); 642} 643 644/* 645 * Close a socket on last file table reference removal. Initiate disconnect 646 * if connected. Free socket when disconnect complete. 647 * 648 * This function will sorele() the socket. Note that soclose() may be called 649 * prior to the ref count reaching zero. The actual socket structure will 650 * not be freed until the ref count reaches zero. 651 / 652int 653soclose(struct socket so) 654{ 655 int error = 0; 656 657 KASSERT(!(so->so_state & SS_NOFDREF), ("soclose: SS_NOFDREF on enter")); 658 659 CURVNET_SET(so->so_vnet); 660 funsetown(&so->so_sigio); 661 if (so->so_state & SS_ISCONNECTED) { 662 if ((so->so_state & SS_ISDISCONNECTING) == 0) { 663 error = sodisconnect(so); 664 if (error) 665 goto drop; 666 } 667 if (so->so_options & SO_LINGER) { 668 if ((so->so_state & SS_ISDISCONNECTING) && 669 (so->so_state & SS_NBIO)) 670 goto drop; 671 while (so->so_state & SS_ISCONNECTED) { 672 error = tsleep(&so->so_timeo, 673 PSOCK \| PCATCH, "soclos", so->so_linger * hz); 674 if (error) 675 break; 676 } 677 } 678 } 679 680drop: 681 if (so->so_proto->pr_usrreqs->pru_close != NULL) 682 (so->so_proto->pr_usrreqs->pru_close)(so); 683* if (so->so_options & SO_ACCEPTCONN) { 684 struct socket sp; 685* ACCEPT_LOCK(); 686 while ((sp = TAILQ_FIRST(&so->so_incomp)) != NULL) { 687 TAILQ_REMOVE(&so->so_incomp, sp, so_list); 688 so->so_incqlen--; 689 sp->so_qstate &= ~SQ_INCOMP; 690 sp->so_head = NULL; 691 ACCEPT_UNLOCK(); 692 soabort(sp); 693 ACCEPT_LOCK(); 694 } 695 while ((sp = TAILQ_FIRST(&so->so_comp)) != NULL) { 696 TAILQ_REMOVE(&so->so_comp, sp, so_list); 697 so->so_qlen--; 698 sp->so_qstate &= ~SQ_COMP; 699 sp->so_head = NULL; 700 ACCEPT_UNLOCK(); 701 soabort(sp); 702 ACCEPT_LOCK(); 703 } 704 ACCEPT_UNLOCK(); 705 } 706 ACCEPT_LOCK(); 707 SOCK_LOCK(so); 708 KASSERT((so->so_state & SS_NOFDREF) == 0, ("soclose: NOFDREF")); 709 so->so_state \|= SS_NOFDREF; 710 sorele(so); 711 CURVNET_RESTORE(); 712 return (error); 713} 714 715/* 716 * soabort() is used to abruptly tear down a connection, such as when a 717 * resource limit is reached (listen queue depth exceeded), or if a listen 718 * socket is closed while there are sockets waiting to be accepted. 719 * 720 * This interface is tricky, because it is called on an unreferenced socket, 721 * and must be called only by a thread that has actually removed the socket 722 * from the listen queue it was on, or races with other threads are risked. 723 * 724 * This interface will call into the protocol code, so must not be called 725 * with any socket locks held. Protocols do call it while holding their own 726 * recursible protocol mutexes, but this is something that should be subject 727 * to review in the future. 728 / 729void 730soabort(struct socket so) 731{ 732 733 /* 734 * In as much as is possible, assert that no references to this 735 * socket are held. This is not quite the same as asserting that the 736 * current thread is responsible for arranging for no references, but 737 * is as close as we can get for now. 738 / 739* KASSERT(so->so_count == 0, ("soabort: so_count")); 740 KASSERT((so->so_state & SS_PROTOREF) == 0, ("soabort: SS_PROTOREF")); 741 KASSERT(so->so_state & SS_NOFDREF, ("soabort: !SS_NOFDREF")); 742 KASSERT((so->so_state & SQ_COMP) == 0, ("soabort: SQ_COMP")); 743 KASSERT((so->so_state & SQ_INCOMP) == 0, ("soabort: SQ_INCOMP")); 744 745 if (so->so_proto->pr_usrreqs->pru_abort != NULL) 746 (so->so_proto->pr_usrreqs->pru_abort)(so); 747* ACCEPT_LOCK(); 748 SOCK_LOCK(so); 749 sofree(so); 750} 751 752int 753soaccept(struct socket so, struct sockaddr nam) 754{ 755* int error; 756 757 SOCK_LOCK(so); 758 KASSERT((so->so_state & SS_NOFDREF) != 0, ("soaccept: !NOFDREF")); 759 so->so_state &= ~SS_NOFDREF; 760 SOCK_UNLOCK(so); 761 error = (so->so_proto->pr_usrreqs->pru_accept)(so, nam); 762* return (error); 763} 764 765int 766soconnect(struct socket so, struct sockaddr nam, struct thread td) 767{ 768* int error; 769 770 if (so->so_options & SO_ACCEPTCONN) 771 return (EOPNOTSUPP); 772 /* 773 * If protocol is connection-based, can only connect once. 774 * Otherwise, if connected, try to disconnect first. This allows 775 * user to disconnect by connecting to, e.g., a null address. 776 / 777* if (so->so_state & (SS_ISCONNECTED\|SS_ISCONNECTING) && 778 ((so->so_proto->pr_flags & PR_CONNREQUIRED) \|\| 779 (error = sodisconnect(so)))) { 780 error = EISCONN; 781 } else { 782 /* 783 * Prevent accumulated error from previous connection from 784 * biting us. 785 / 786* so->so_error = 0; 787 CURVNET_SET(so->so_vnet); 788 error = (so->so_proto->pr_usrreqs->pru_connect)(so, nam, td); 789* CURVNET_RESTORE(); 790 } 791 792 return (error); 793} 794 795int 796soconnect2(struct socket so1, struct socket so2) 797{ 798 799 return ((so1->so_proto->pr_usrreqs->pru_connect2)(so1, so2)); 800} 801* 802int 803sodisconnect(struct socket so) 804{ 805* int error; 806 807 if ((so->so_state & SS_ISCONNECTED) == 0) 808 return (ENOTCONN); 809 if (so->so_state & SS_ISDISCONNECTING) 810 return (EALREADY); 811 error = (so->so_proto->pr_usrreqs->pru_disconnect)(so); 812* return (error); 813} 814 815#ifdef ZERO_COPY_SOCKETS 816struct so_zerocopy_stats{ 817 int size_ok; 818 int align_ok; 819 int found_ifp; 820}; 821struct so_zerocopy_stats so_zerocp_stats = {0,0,0}; 822#include <netinet/in.h> 823#include <net/route.h> 824#include <netinet/in_pcb.h> 825#include <vm/vm.h> 826#include <vm/vm_page.h> 827#include <vm/vm_object.h> 828 829/* 830 * sosend_copyin() is only used if zero copy sockets are enabled. Otherwise 831 * sosend_dgram() and sosend_generic() use m_uiotombuf(). 832 * 833 * sosend_copyin() accepts a uio and prepares an mbuf chain holding part or 834 * all of the data referenced by the uio. If desired, it uses zero-copy. 835 * space will be updated to reflect data copied in. 836* * 837 * NB: If atomic I/O is requested, the caller must already have checked that 838 * space can hold resid bytes. 839 * 840 * NB: In the event of an error, the caller may need to free the partial 841 * chain pointed to by mpp. The contents of both uio and space may be 842* * modified even in the case of an error. 843 / 844static int 845sosend_copyin(struct uio uio, struct mbuf *retmp, int atomic, long space, 846 int flags) 847{ 848 struct mbuf m, mp, top; 849 long len, resid; 850 int error; 851#ifdef ZERO_COPY_SOCKETS 852 int cow_send; 853#endif 854 855 retmp = top = NULL; 856* mp = &top; 857 len = 0; 858 resid = uio->uio_resid; 859 error = 0; 860 do { 861#ifdef ZERO_COPY_SOCKETS 862 cow_send = 0; 863#endif /* ZERO_COPY_SOCKETS / 864* if (resid >= MINCLSIZE) { 865#ifdef ZERO_COPY_SOCKETS 866 if (top == NULL) { 867 m = m_gethdr(M_WAITOK, MT_DATA); 868 m->m_pkthdr.len = 0; 869 m->m_pkthdr.rcvif = NULL; 870 } else 871 m = m_get(M_WAITOK, MT_DATA); 872 if (so_zero_copy_send && 873 resid>=PAGE_SIZE && 874 space>=PAGE_SIZE && 875* uio->uio_iov->iov_len>=PAGE_SIZE) { 876 so_zerocp_stats.size_ok++; 877 so_zerocp_stats.align_ok++; 878 cow_send = socow_setup(m, uio); 879 len = cow_send; 880 } 881 if (!cow_send) { 882 m_clget(m, M_WAITOK); 883 len = min(min(MCLBYTES, resid), space); 884* } 885#else /* ZERO_COPY_SOCKETS / 886* if (top == NULL) { 887 m = m_getcl(M_WAIT, MT_DATA, M_PKTHDR); 888 m->m_pkthdr.len = 0; 889 m->m_pkthdr.rcvif = NULL; 890 } else 891 m = m_getcl(M_WAIT, MT_DATA, 0); 892 len = min(min(MCLBYTES, resid), space); 893#endif / ZERO_COPY_SOCKETS / 894* } else { 895 if (top == NULL) { 896 m = m_gethdr(M_WAIT, MT_DATA); 897 m->m_pkthdr.len = 0; 898 m->m_pkthdr.rcvif = NULL; 899 900 len = min(min(MHLEN, resid), space); 901* /* 902 * For datagram protocols, leave room 903 * for protocol headers in first mbuf. 904 / 905* if (atomic && m && len < MHLEN) 906 MH_ALIGN(m, len); 907 } else { 908 m = m_get(M_WAIT, MT_DATA); 909 len = min(min(MLEN, resid), space); 910* } 911 } 912 if (m == NULL) { 913 error = ENOBUFS; 914 goto out; 915 } 916 917 space -= len; 918#ifdef ZERO_COPY_SOCKETS 919* if (cow_send) 920 error = 0; 921 else 922#endif /* ZERO_COPY_SOCKETS / 923* error = uiomove(mtod(m, void ), (int)len, uio); 924* resid = uio->uio_resid; 925 m->m_len = len; 926 mp = m; 927* top->m_pkthdr.len += len; 928 if (error) 929 goto out; 930 mp = &m->m_next; 931 if (resid <= 0) { 932 if (flags & MSG_EOR) 933 top->m_flags \|= M_EOR; 934 break; 935 } 936 } while (space > 0 && atomic); 937out: 938* retmp = top; 939* return (error); 940} 941#endif /ZERO_COPY_SOCKETS/ 942 943#define SBLOCKWAIT(f) (((f) & MSG_DONTWAIT) ? 0 : SBL_WAIT) 944 945int 946sosend_dgram(struct socket so, struct sockaddr addr, struct uio uio, 947* struct mbuf top, struct mbuf control, int flags, struct thread td) 948{ 949* long space, resid; 950 int clen = 0, error, dontroute; 951#ifdef ZERO_COPY_SOCKETS 952 int atomic = sosendallatonce(so) \|\| top; 953#endif 954 955 KASSERT(so->so_type == SOCK_DGRAM, ("sodgram_send: !SOCK_DGRAM")); 956 KASSERT(so->so_proto->pr_flags & PR_ATOMIC, 957 ("sodgram_send: !PR_ATOMIC")); 958 959 if (uio != NULL) 960 resid = uio->uio_resid; 961 else 962 resid = top->m_pkthdr.len; 963 /* 964 * In theory resid should be unsigned. However, space must be 965 * signed, as it might be less than 0 if we over-committed, and we 966 * must use a signed comparison of space and resid. On the other 967 * hand, a negative resid causes us to loop sending 0-length 968 * segments to the protocol. 969 * 970 * Also check to make sure that MSG_EOR isn't used on SOCK_STREAM 971 * type sockets since that's an error. 972 / 973* if (resid < 0) { 974 error = EINVAL; 975 goto out; 976 } 977 978 dontroute = 979 (flags & MSG_DONTROUTE) && (so->so_options & SO_DONTROUTE) == 0; 980 if (td != NULL) 981 td->td_ru.ru_msgsnd++; 982 if (control != NULL) 983 clen = control->m_len; 984 985 SOCKBUF_LOCK(&so->so_snd); 986 if (so->so_snd.sb_state & SBS_CANTSENDMORE) { 987 SOCKBUF_UNLOCK(&so->so_snd); 988 error = EPIPE; 989 goto out; 990 } 991 if (so->so_error) { 992 error = so->so_error; 993 so->so_error = 0; 994 SOCKBUF_UNLOCK(&so->so_snd); 995 goto out; 996 } 997 if ((so->so_state & SS_ISCONNECTED) == 0) { 998 /* 999 * `sendto' and `sendmsg' is allowed on a connection-based 1000 * socket if it supports implied connect. Return ENOTCONN if 1001 * not connected and no address is supplied. 1002 / 1003* if ((so->so_proto->pr_flags & PR_CONNREQUIRED) && 1004 (so->so_proto->pr_flags & PR_IMPLOPCL) == 0) { 1005 if ((so->so_state & SS_ISCONFIRMING) == 0 && 1006 !(resid == 0 && clen != 0)) { 1007 SOCKBUF_UNLOCK(&so->so_snd); 1008 error = ENOTCONN; 1009 goto out; 1010 } 1011 } else if (addr == NULL) { 1012 if (so->so_proto->pr_flags & PR_CONNREQUIRED) 1013 error = ENOTCONN; 1014 else 1015 error = EDESTADDRREQ; 1016 SOCKBUF_UNLOCK(&so->so_snd); 1017 goto out; 1018 } 1019 } 1020 1021 /* 1022 * Do we need MSG_OOB support in SOCK_DGRAM? Signs here may be a 1023 * problem and need fixing. 1024 / 1025* space = sbspace(&so->so_snd); 1026 if (flags & MSG_OOB) 1027 space += 1024; 1028 space -= clen; 1029 SOCKBUF_UNLOCK(&so->so_snd); 1030 if (resid > space) { 1031 error = EMSGSIZE; 1032 goto out; 1033 } 1034 if (uio == NULL) { 1035 resid = 0; 1036 if (flags & MSG_EOR) 1037 top->m_flags \|= M_EOR; 1038 } else { 1039#ifdef ZERO_COPY_SOCKETS 1040 error = sosend_copyin(uio, &top, atomic, &space, flags); 1041 if (error) 1042 goto out; 1043#else 1044 /* 1045 * Copy the data from userland into a mbuf chain. 1046 * If no data is to be copied in, a single empty mbuf 1047 * is returned. 1048 / 1049* top = m_uiotombuf(uio, M_WAITOK, space, max_hdr, 1050 (M_PKTHDR \| ((flags & MSG_EOR) ? M_EOR : 0))); 1051 if (top == NULL) { 1052 error = EFAULT; /* only possible error / 1053* goto out; 1054 } 1055 space -= resid - uio->uio_resid; 1056#endif 1057 resid = uio->uio_resid; 1058 } 1059 KASSERT(resid == 0, ("sosend_dgram: resid != 0")); 1060 /* 1061 * XXXRW: Frobbing SO_DONTROUTE here is even worse without sblock 1062 * than with. 1063 / 1064* if (dontroute) { 1065 SOCK_LOCK(so); 1066 so->so_options \|= SO_DONTROUTE; 1067 SOCK_UNLOCK(so); 1068 } 1069 /* 1070 * XXX all the SBS_CANTSENDMORE checks previously done could be out 1071 * of date. We could have recieved a reset packet in an interrupt or 1072 * maybe we slept while doing page faults in uiomove() etc. We could 1073 * probably recheck again inside the locking protection here, but 1074 * there are probably other places that this also happens. We must 1075 * rethink this. 1076 / 1077* error = (so->so_proto->pr_usrreqs->pru_send)(so, 1078* (flags & MSG_OOB) ? PRUS_OOB : 1079 /* 1080 * If the user set MSG_EOF, the protocol understands this flag and 1081 * nothing left to send then use PRU_SEND_EOF instead of PRU_SEND. 1082 / 1083* ((flags & MSG_EOF) && 1084 (so->so_proto->pr_flags & PR_IMPLOPCL) && 1085 (resid <= 0)) ? 1086 PRUS_EOF : 1087 /* If there is more to send set PRUS_MORETOCOME / 1088* (resid > 0 && space > 0) ? PRUS_MORETOCOME : 0, 1089 top, addr, control, td); 1090 if (dontroute) { 1091 SOCK_LOCK(so); 1092 so->so_options &= ~SO_DONTROUTE; 1093 SOCK_UNLOCK(so); 1094 } 1095 clen = 0; 1096 control = NULL; 1097 top = NULL; 1098out: 1099 if (top != NULL) 1100 m_freem(top); 1101 if (control != NULL) 1102 m_freem(control); 1103 return (error); 1104} 1105 1106/* 1107 * Send on a socket. If send must go all at once and message is larger than 1108 * send buffering, then hard error. Lock against other senders. If must go 1109 * all at once and not enough room now, then inform user that this would 1110 * block and do nothing. Otherwise, if nonblocking, send as much as 1111 * possible. The data to be sent is described by "uio" if nonzero, otherwise 1112 * by the mbuf chain "top" (which must be null if uio is not). Data provided 1113 * in mbuf chain must be small enough to send all at once. 1114 * 1115 * Returns nonzero on error, timeout or signal; callers must check for short 1116 * counts if EINTR/ERESTART are returned. Data and control buffers are freed 1117 * on return. 1118 / 1119int 1120sosend_generic(struct socket so, struct sockaddr addr, struct uio uio, 1121 struct mbuf top, struct mbuf control, int flags, struct thread td) 1122{ 1123* long space, resid; 1124 int clen = 0, error, dontroute; 1125 int atomic = sosendallatonce(so) \|\| top; 1126 1127 if (uio != NULL) 1128 resid = uio->uio_resid; 1129 else 1130 resid = top->m_pkthdr.len; 1131 /* 1132 * In theory resid should be unsigned. However, space must be 1133 * signed, as it might be less than 0 if we over-committed, and we 1134 * must use a signed comparison of space and resid. On the other 1135 * hand, a negative resid causes us to loop sending 0-length 1136 * segments to the protocol. 1137 * 1138 * Also check to make sure that MSG_EOR isn't used on SOCK_STREAM 1139 * type sockets since that's an error. 1140 / 1141* if (resid < 0 \|\| (so->so_type == SOCK_STREAM && (flags & MSG_EOR))) { 1142 error = EINVAL; 1143 goto out; 1144 } 1145 1146 dontroute = 1147 (flags & MSG_DONTROUTE) && (so->so_options & SO_DONTROUTE) == 0 && 1148 (so->so_proto->pr_flags & PR_ATOMIC); 1149 if (td != NULL) 1150 td->td_ru.ru_msgsnd++; 1151 if (control != NULL) 1152 clen = control->m_len; 1153 1154 error = sblock(&so->so_snd, SBLOCKWAIT(flags)); 1155 if (error) 1156 goto out; 1157 1158restart: 1159 do { 1160 SOCKBUF_LOCK(&so->so_snd); 1161 if (so->so_snd.sb_state & SBS_CANTSENDMORE) { 1162 SOCKBUF_UNLOCK(&so->so_snd); 1163 error = EPIPE; 1164 goto release; 1165 } 1166 if (so->so_error) { 1167 error = so->so_error; 1168 so->so_error = 0; 1169 SOCKBUF_UNLOCK(&so->so_snd); 1170 goto release; 1171 } 1172 if ((so->so_state & SS_ISCONNECTED) == 0) { 1173 /* 1174 * `sendto' and `sendmsg' is allowed on a connection- 1175 * based socket if it supports implied connect. 1176 * Return ENOTCONN if not connected and no address is 1177 * supplied. 1178 / 1179* if ((so->so_proto->pr_flags & PR_CONNREQUIRED) && 1180 (so->so_proto->pr_flags & PR_IMPLOPCL) == 0) { 1181 if ((so->so_state & SS_ISCONFIRMING) == 0 && 1182 !(resid == 0 && clen != 0)) { 1183 SOCKBUF_UNLOCK(&so->so_snd); 1184 error = ENOTCONN; 1185 goto release; 1186 } 1187 } else if (addr == NULL) { 1188 SOCKBUF_UNLOCK(&so->so_snd); 1189 if (so->so_proto->pr_flags & PR_CONNREQUIRED) 1190 error = ENOTCONN; 1191 else 1192 error = EDESTADDRREQ; 1193 goto release; 1194 } 1195 } 1196 space = sbspace(&so->so_snd); 1197 if (flags & MSG_OOB) 1198 space += 1024; 1199 if ((atomic && resid > so->so_snd.sb_hiwat) \|\| 1200 clen > so->so_snd.sb_hiwat) { 1201 SOCKBUF_UNLOCK(&so->so_snd); 1202 error = EMSGSIZE; 1203 goto release; 1204 } 1205 if (space < resid + clen && 1206 (atomic \|\| space < so->so_snd.sb_lowat \|\| space < clen)) { 1207 if ((so->so_state & SS_NBIO) \|\| (flags & MSG_NBIO)) { 1208 SOCKBUF_UNLOCK(&so->so_snd); 1209 error = EWOULDBLOCK; 1210 goto release; 1211 } 1212 error = sbwait(&so->so_snd); 1213 SOCKBUF_UNLOCK(&so->so_snd); 1214 if (error) 1215 goto release; 1216 goto restart; 1217 } 1218 SOCKBUF_UNLOCK(&so->so_snd); 1219 space -= clen; 1220 do { 1221 if (uio == NULL) { 1222 resid = 0; 1223 if (flags & MSG_EOR) 1224 top->m_flags \|= M_EOR; 1225 } else { 1226#ifdef ZERO_COPY_SOCKETS 1227 error = sosend_copyin(uio, &top, atomic, 1228 &space, flags); 1229 if (error != 0) 1230 goto release; 1231#else 1232 /* 1233 * Copy the data from userland into a mbuf 1234 * chain. If no data is to be copied in, 1235 * a single empty mbuf is returned. 1236 / 1237* top = m_uiotombuf(uio, M_WAITOK, space, 1238 (atomic ? max_hdr : 0), 1239 (atomic ? M_PKTHDR : 0) \| 1240 ((flags & MSG_EOR) ? M_EOR : 0)); 1241 if (top == NULL) { 1242 error = EFAULT; /* only possible error / 1243* goto release; 1244 } 1245 space -= resid - uio->uio_resid; 1246#endif 1247 resid = uio->uio_resid; 1248 } 1249 if (dontroute) { 1250 SOCK_LOCK(so); 1251 so->so_options \|= SO_DONTROUTE; 1252 SOCK_UNLOCK(so); 1253 } 1254 /* 1255 * XXX all the SBS_CANTSENDMORE checks previously 1256 * done could be out of date. We could have recieved 1257 * a reset packet in an interrupt or maybe we slept 1258 * while doing page faults in uiomove() etc. We 1259 * could probably recheck again inside the locking 1260 * protection here, but there are probably other 1261 * places that this also happens. We must rethink 1262 * this. 1263 / 1264* error = (so->so_proto->pr_usrreqs->pru_send)(so, 1265* (flags & MSG_OOB) ? PRUS_OOB : 1266 /* 1267 * If the user set MSG_EOF, the protocol understands 1268 * this flag and nothing left to send then use 1269 * PRU_SEND_EOF instead of PRU_SEND. 1270 / 1271* ((flags & MSG_EOF) && 1272 (so->so_proto->pr_flags & PR_IMPLOPCL) && 1273 (resid <= 0)) ? 1274 PRUS_EOF : 1275 /* If there is more to send set PRUS_MORETOCOME. / 1276* (resid > 0 && space > 0) ? PRUS_MORETOCOME : 0, 1277 top, addr, control, td); 1278 if (dontroute) { 1279 SOCK_LOCK(so); 1280 so->so_options &= ~SO_DONTROUTE; 1281 SOCK_UNLOCK(so); 1282 } 1283 clen = 0; 1284 control = NULL; 1285 top = NULL; 1286 if (error) 1287 goto release; 1288 } while (resid && space > 0); 1289 } while (resid); 1290 1291release: 1292 sbunlock(&so->so_snd); 1293out: 1294 if (top != NULL) 1295 m_freem(top); 1296 if (control != NULL) 1297 m_freem(control); 1298 return (error); 1299} 1300 1301int 1302sosend(struct socket so, struct sockaddr addr, struct uio uio, 1303* struct mbuf top, struct mbuf control, int flags, struct thread td) 1304{ 1305* int error; 1306 1307 CURVNET_SET(so->so_vnet); 1308 error = so->so_proto->pr_usrreqs->pru_sosend(so, addr, uio, top, 1309 control, flags, td); 1310 CURVNET_RESTORE(); 1311 return (error); 1312} 1313 1314/* 1315 * The part of soreceive() that implements reading non-inline out-of-band 1316 * data from a socket. For more complete comments, see soreceive(), from 1317 * which this code originated. 1318 * 1319 * Note that soreceive_rcvoob(), unlike the remainder of soreceive(), is 1320 * unable to return an mbuf chain to the caller. 1321 / 1322static int 1323soreceive_rcvoob(struct socket so, struct uio uio, int flags) 1324{ 1325* struct protosw pr = so->so_proto; 1326* struct mbuf m; 1327* int error; 1328 1329 KASSERT(flags & MSG_OOB, ("soreceive_rcvoob: (flags & MSG_OOB) == 0")); 1330 1331 m = m_get(M_WAIT, MT_DATA); 1332 error = (pr->pr_usrreqs->pru_rcvoob)(so, m, flags & MSG_PEEK); 1333* if (error) 1334 goto bad; 1335 do { 1336#ifdef ZERO_COPY_SOCKETS 1337 if (so_zero_copy_receive) { 1338 int disposable; 1339 1340 if ((m->m_flags & M_EXT) 1341 && (m->m_ext.ext_type == EXT_DISPOSABLE)) 1342 disposable = 1; 1343 else 1344 disposable = 0; 1345 1346 error = uiomoveco(mtod(m, void ), 1347* min(uio->uio_resid, m->m_len), 1348 uio, disposable); 1349 } else 1350#endif /* ZERO_COPY_SOCKETS / 1351* error = uiomove(mtod(m, void ), 1352* (int) min(uio->uio_resid, m->m_len), uio); 1353 m = m_free(m); 1354 } while (uio->uio_resid && error == 0 && m); 1355bad: 1356 if (m != NULL) 1357 m_freem(m); 1358 return (error); 1359} 1360 1361/* 1362 * Following replacement or removal of the first mbuf on the first mbuf chain 1363 * of a socket buffer, push necessary state changes back into the socket 1364 * buffer so that other consumers see the values consistently. 'nextrecord' 1365 * is the callers locally stored value of the original value of 1366 * sb->sb_mb->m_nextpkt which must be restored when the lead mbuf changes. 1367 * NOTE: 'nextrecord' may be NULL. 1368 / 1369static __inline void 1370sockbuf_pushsync(struct sockbuf sb, struct mbuf nextrecord) 1371{ 1372* 1373 SOCKBUF_LOCK_ASSERT(sb); 1374 /* 1375 * First, update for the new value of nextrecord. If necessary, make 1376 * it the first record. 1377 / 1378* if (sb->sb_mb != NULL) 1379 sb->sb_mb->m_nextpkt = nextrecord; 1380 else 1381 sb->sb_mb = nextrecord; 1382 1383 /* 1384 * Now update any dependent socket buffer fields to reflect the new 1385 * state. This is an expanded inline of SB_EMPTY_FIXUP(), with the 1386 * addition of a second clause that takes care of the case where 1387 * sb_mb has been updated, but remains the last record. 1388 / 1389* if (sb->sb_mb == NULL) { 1390 sb->sb_mbtail = NULL; 1391 sb->sb_lastrecord = NULL; 1392 } else if (sb->sb_mb->m_nextpkt == NULL) 1393 sb->sb_lastrecord = sb->sb_mb; 1394} 1395 1396 1397/* 1398 * Implement receive operations on a socket. We depend on the way that 1399 * records are added to the sockbuf by sbappend. In particular, each record 1400 * (mbufs linked through m_next) must begin with an address if the protocol 1401 * so specifies, followed by an optional mbuf or mbufs containing ancillary 1402 * data, and then zero or more mbufs of data. In order to allow parallelism 1403 * between network receive and copying to user space, as well as avoid 1404 * sleeping with a mutex held, we release the socket buffer mutex during the 1405 * user space copy. Although the sockbuf is locked, new data may still be 1406 * appended, and thus we must maintain consistency of the sockbuf during that 1407 * time. 1408 * 1409 * The caller may receive the data as a single mbuf chain by supplying an 1410 * mbuf *mp0 for use in returning the chain. The uio is then used only for 1411* * the count in uio_resid. 1412 / 1413int 1414soreceive_generic(struct socket so, struct sockaddr *psa, struct uio uio, 1415 struct mbuf mp0, struct mbuf controlp, int flagsp) 1416{ 1417* struct mbuf m, mp; 1418* int flags, len, error, offset; 1419 struct protosw pr = so->so_proto; 1420* struct mbuf nextrecord; 1421* int moff, type = 0; 1422 int orig_resid = uio->uio_resid; 1423 1424 mp = mp0; 1425 if (psa != NULL) 1426 psa = NULL; 1427* if (controlp != NULL) 1428 controlp = NULL; 1429* if (flagsp != NULL) 1430 flags = flagsp &~ MSG_EOR; 1431* else 1432 flags = 0; 1433 if (flags & MSG_OOB) 1434 return (soreceive_rcvoob(so, uio, flags)); 1435 if (mp != NULL) 1436 mp = NULL; 1437* if ((pr->pr_flags & PR_WANTRCVD) && (so->so_state & SS_ISCONFIRMING) 1438 && uio->uio_resid) 1439 (pr->pr_usrreqs->pru_rcvd)(so, 0); 1440* 1441 error = sblock(&so->so_rcv, SBLOCKWAIT(flags)); 1442 if (error) 1443 return (error); 1444 1445restart: 1446 SOCKBUF_LOCK(&so->so_rcv); 1447 m = so->so_rcv.sb_mb; 1448 /* 1449 * If we have less data than requested, block awaiting more (subject 1450 * to any timeout) if: 1451 * 1. the current count is less than the low water mark, or 1452 * 2. MSG_WAITALL is set, and it is possible to do the entire 1453 * receive operation at once if we block (resid <= hiwat). 1454 * 3. MSG_DONTWAIT is not set 1455 * If MSG_WAITALL is set but resid is larger than the receive buffer, 1456 * we have to do the receive in sections, and thus risk returning a 1457 * short count if a timeout or signal occurs after we start. 1458 / 1459* if (m == NULL \|\| (((flags & MSG_DONTWAIT) == 0 && 1460 so->so_rcv.sb_cc < uio->uio_resid) && 1461 (so->so_rcv.sb_cc < so->so_rcv.sb_lowat \|\| 1462 ((flags & MSG_WAITALL) && uio->uio_resid <= so->so_rcv.sb_hiwat)) && 1463 m->m_nextpkt == NULL && (pr->pr_flags & PR_ATOMIC) == 0)) { 1464 KASSERT(m != NULL \|\| !so->so_rcv.sb_cc, 1465 ("receive: m == %p so->so_rcv.sb_cc == %u", 1466 m, so->so_rcv.sb_cc)); 1467 if (so->so_error) { 1468 if (m != NULL) 1469 goto dontblock; 1470 error = so->so_error; 1471 if ((flags & MSG_PEEK) == 0) 1472 so->so_error = 0; 1473 SOCKBUF_UNLOCK(&so->so_rcv); 1474 goto release; 1475 } 1476 SOCKBUF_LOCK_ASSERT(&so->so_rcv); 1477 if (so->so_rcv.sb_state & SBS_CANTRCVMORE) { 1478 if (m == NULL) { 1479 SOCKBUF_UNLOCK(&so->so_rcv); 1480 goto release; 1481 } else 1482 goto dontblock; 1483 } 1484 for (; m != NULL; m = m->m_next) 1485 if (m->m_type == MT_OOBDATA \|\| (m->m_flags & M_EOR)) { 1486 m = so->so_rcv.sb_mb; 1487 goto dontblock; 1488 } 1489 if ((so->so_state & (SS_ISCONNECTED\|SS_ISCONNECTING)) == 0 && 1490 (so->so_proto->pr_flags & PR_CONNREQUIRED)) { 1491 SOCKBUF_UNLOCK(&so->so_rcv); 1492 error = ENOTCONN; 1493 goto release; 1494 } 1495 if (uio->uio_resid == 0) { 1496 SOCKBUF_UNLOCK(&so->so_rcv); 1497 goto release; 1498 } 1499 if ((so->so_state & SS_NBIO) \|\| 1500 (flags & (MSG_DONTWAIT\|MSG_NBIO))) { 1501 SOCKBUF_UNLOCK(&so->so_rcv); 1502 error = EWOULDBLOCK; 1503 goto release; 1504 } 1505 SBLASTRECORDCHK(&so->so_rcv); 1506 SBLASTMBUFCHK(&so->so_rcv); 1507 error = sbwait(&so->so_rcv); 1508 SOCKBUF_UNLOCK(&so->so_rcv); 1509 if (error) 1510 goto release; 1511 goto restart; 1512 } 1513dontblock: 1514 /* 1515 * From this point onward, we maintain 'nextrecord' as a cache of the 1516 * pointer to the next record in the socket buffer. We must keep the 1517 * various socket buffer pointers and local stack versions of the 1518 * pointers in sync, pushing out modifications before dropping the 1519 * socket buffer mutex, and re-reading them when picking it up. 1520 * 1521 * Otherwise, we will race with the network stack appending new data 1522 * or records onto the socket buffer by using inconsistent/stale 1523 * versions of the field, possibly resulting in socket buffer 1524 * corruption. 1525 * 1526 * By holding the high-level sblock(), we prevent simultaneous 1527 * readers from pulling off the front of the socket buffer. 1528 / 1529* SOCKBUF_LOCK_ASSERT(&so->so_rcv); 1530 if (uio->uio_td) 1531 uio->uio_td->td_ru.ru_msgrcv++; 1532 KASSERT(m == so->so_rcv.sb_mb, ("soreceive: m != so->so_rcv.sb_mb")); 1533 SBLASTRECORDCHK(&so->so_rcv); 1534 SBLASTMBUFCHK(&so->so_rcv); 1535 nextrecord = m->m_nextpkt; 1536 if (pr->pr_flags & PR_ADDR) { 1537 KASSERT(m->m_type == MT_SONAME, 1538 ("m->m_type == %d", m->m_type)); 1539 orig_resid = 0; 1540 if (psa != NULL) 1541 psa = sodupsockaddr(mtod(m, struct sockaddr ), 1542 M_NOWAIT); 1543 if (flags & MSG_PEEK) { 1544 m = m->m_next; 1545 } else { 1546 sbfree(&so->so_rcv, m); 1547 so->so_rcv.sb_mb = m_free(m); 1548 m = so->so_rcv.sb_mb; 1549 sockbuf_pushsync(&so->so_rcv, nextrecord); 1550 } 1551 } 1552 1553 /* 1554 * Process one or more MT_CONTROL mbufs present before any data mbufs 1555 * in the first mbuf chain on the socket buffer. If MSG_PEEK, we 1556 * just copy the data; if !MSG_PEEK, we call into the protocol to 1557 * perform externalization (or freeing if controlp == NULL). 1558 / 1559* if (m != NULL && m->m_type == MT_CONTROL) { 1560 struct mbuf cm = NULL, cmn; 1561 struct mbuf *cme = &cm; 1562* 1563 do { 1564 if (flags & MSG_PEEK) { 1565 if (controlp != NULL) { 1566 controlp = m_copy(m, 0, m->m_len); 1567* controlp = &(controlp)->m_next; 1568* } 1569 m = m->m_next; 1570 } else { 1571 sbfree(&so->so_rcv, m); 1572 so->so_rcv.sb_mb = m->m_next; 1573 m->m_next = NULL; 1574 cme = m; 1575* cme = &(cme)->m_next; 1576* m = so->so_rcv.sb_mb; 1577 } 1578 } while (m != NULL && m->m_type == MT_CONTROL); 1579 if ((flags & MSG_PEEK) == 0) 1580 sockbuf_pushsync(&so->so_rcv, nextrecord); 1581 while (cm != NULL) { 1582 cmn = cm->m_next; 1583 cm->m_next = NULL; 1584 if (pr->pr_domain->dom_externalize != NULL) { 1585 SOCKBUF_UNLOCK(&so->so_rcv); 1586 error = (pr->pr_domain->dom_externalize) 1587* (cm, controlp); 1588 SOCKBUF_LOCK(&so->so_rcv); 1589 } else if (controlp != NULL) 1590 controlp = cm; 1591* else 1592 m_freem(cm); 1593 if (controlp != NULL) { 1594 orig_resid = 0; 1595 while (controlp != NULL) 1596* controlp = &(controlp)->m_next; 1597* } 1598 cm = cmn; 1599 } 1600 if (m != NULL) 1601 nextrecord = so->so_rcv.sb_mb->m_nextpkt; 1602 else 1603 nextrecord = so->so_rcv.sb_mb; 1604 orig_resid = 0; 1605 } 1606 if (m != NULL) { 1607 if ((flags & MSG_PEEK) == 0) { 1608 KASSERT(m->m_nextpkt == nextrecord, 1609 ("soreceive: post-control, nextrecord !sync")); 1610 if (nextrecord == NULL) { 1611 KASSERT(so->so_rcv.sb_mb == m, 1612 ("soreceive: post-control, sb_mb!=m")); 1613 KASSERT(so->so_rcv.sb_lastrecord == m, 1614 ("soreceive: post-control, lastrecord!=m")); 1615 } 1616 } 1617 type = m->m_type; 1618 if (type == MT_OOBDATA) 1619 flags \|= MSG_OOB; 1620 } else { 1621 if ((flags & MSG_PEEK) == 0) { 1622 KASSERT(so->so_rcv.sb_mb == nextrecord, 1623 ("soreceive: sb_mb != nextrecord")); 1624 if (so->so_rcv.sb_mb == NULL) { 1625 KASSERT(so->so_rcv.sb_lastrecord == NULL, 1626 ("soreceive: sb_lastercord != NULL")); 1627 } 1628 } 1629 } 1630 SOCKBUF_LOCK_ASSERT(&so->so_rcv); 1631 SBLASTRECORDCHK(&so->so_rcv); 1632 SBLASTMBUFCHK(&so->so_rcv); 1633 1634 /* 1635 * Now continue to read any data mbufs off of the head of the socket 1636 * buffer until the read request is satisfied. Note that 'type' is 1637 * used to store the type of any mbuf reads that have happened so far 1638 * such that soreceive() can stop reading if the type changes, which 1639 * causes soreceive() to return only one of regular data and inline 1640 * out-of-band data in a single socket receive operation. 1641 / 1642* moff = 0; 1643 offset = 0; 1644 while (m != NULL && uio->uio_resid > 0 && error == 0) { 1645 /* 1646 * If the type of mbuf has changed since the last mbuf 1647 * examined ('type'), end the receive operation. 1648 / 1649* SOCKBUF_LOCK_ASSERT(&so->so_rcv); 1650 if (m->m_type == MT_OOBDATA) { 1651 if (type != MT_OOBDATA) 1652 break; 1653 } else if (type == MT_OOBDATA) 1654 break; 1655 else 1656 KASSERT(m->m_type == MT_DATA, 1657 ("m->m_type == %d", m->m_type)); 1658 so->so_rcv.sb_state &= ~SBS_RCVATMARK; 1659 len = uio->uio_resid; 1660 if (so->so_oobmark && len > so->so_oobmark - offset) 1661 len = so->so_oobmark - offset; 1662 if (len > m->m_len - moff) 1663 len = m->m_len - moff; 1664 /* 1665 * If mp is set, just pass back the mbufs. Otherwise copy 1666 * them out via the uio, then free. Sockbuf must be 1667 * consistent here (points to current mbuf, it points to next 1668 * record) when we drop priority; we must note any additions 1669 * to the sockbuf when we block interrupts again. 1670 / 1671* if (mp == NULL) { 1672 SOCKBUF_LOCK_ASSERT(&so->so_rcv); 1673 SBLASTRECORDCHK(&so->so_rcv); 1674 SBLASTMBUFCHK(&so->so_rcv); 1675 SOCKBUF_UNLOCK(&so->so_rcv); 1676#ifdef ZERO_COPY_SOCKETS 1677 if (so_zero_copy_receive) { 1678 int disposable; 1679 1680 if ((m->m_flags & M_EXT) 1681 && (m->m_ext.ext_type == EXT_DISPOSABLE)) 1682 disposable = 1; 1683 else 1684 disposable = 0; 1685 1686 error = uiomoveco(mtod(m, char ) + moff, 1687* (int)len, uio, 1688 disposable); 1689 } else 1690#endif /* ZERO_COPY_SOCKETS / 1691* error = uiomove(mtod(m, char ) + moff, (int)len, uio); 1692* SOCKBUF_LOCK(&so->so_rcv); 1693 if (error) { 1694 /* 1695 * The MT_SONAME mbuf has already been removed 1696 * from the record, so it is necessary to 1697 * remove the data mbufs, if any, to preserve 1698 * the invariant in the case of PR_ADDR that 1699 * requires MT_SONAME mbufs at the head of 1700 * each record. 1701 / 1702* if (m && pr->pr_flags & PR_ATOMIC && 1703 ((flags & MSG_PEEK) == 0)) 1704 (void)sbdroprecord_locked(&so->so_rcv); 1705 SOCKBUF_UNLOCK(&so->so_rcv); 1706 goto release; 1707 } 1708 } else 1709 uio->uio_resid -= len; 1710 SOCKBUF_LOCK_ASSERT(&so->so_rcv); 1711 if (len == m->m_len - moff) { 1712 if (m->m_flags & M_EOR) 1713 flags \|= MSG_EOR; 1714 if (flags & MSG_PEEK) { 1715 m = m->m_next; 1716 moff = 0; 1717 } else { 1718 nextrecord = m->m_nextpkt; 1719 sbfree(&so->so_rcv, m); 1720 if (mp != NULL) { 1721 mp = m; 1722* mp = &m->m_next; 1723 so->so_rcv.sb_mb = m = m->m_next; 1724 mp = NULL; 1725* } else { 1726 so->so_rcv.sb_mb = m_free(m); 1727 m = so->so_rcv.sb_mb; 1728 } 1729 sockbuf_pushsync(&so->so_rcv, nextrecord); 1730 SBLASTRECORDCHK(&so->so_rcv); 1731 SBLASTMBUFCHK(&so->so_rcv); 1732 } 1733 } else { 1734 if (flags & MSG_PEEK) 1735 moff += len; 1736 else { 1737 if (mp != NULL) { 1738 int copy_flag; 1739 1740 if (flags & MSG_DONTWAIT) 1741 copy_flag = M_DONTWAIT; 1742 else 1743 copy_flag = M_WAIT; 1744 if (copy_flag == M_WAIT) 1745 SOCKBUF_UNLOCK(&so->so_rcv); 1746 mp = m_copym(m, 0, len, copy_flag); 1747* if (copy_flag == M_WAIT) 1748 SOCKBUF_LOCK(&so->so_rcv); 1749 if (mp == NULL) { 1750* /* 1751 * m_copym() couldn't 1752 * allocate an mbuf. Adjust 1753 * uio_resid back (it was 1754 * adjusted down by len 1755 * bytes, which we didn't end 1756 * up "copying" over). 1757 / 1758* uio->uio_resid += len; 1759 break; 1760 } 1761 } 1762 m->m_data += len; 1763 m->m_len -= len; 1764 so->so_rcv.sb_cc -= len; 1765 } 1766 } 1767 SOCKBUF_LOCK_ASSERT(&so->so_rcv); 1768 if (so->so_oobmark) { 1769 if ((flags & MSG_PEEK) == 0) { 1770 so->so_oobmark -= len; 1771 if (so->so_oobmark == 0) { 1772 so->so_rcv.sb_state \|= SBS_RCVATMARK; 1773 break; 1774 } 1775 } else { 1776 offset += len; 1777 if (offset == so->so_oobmark) 1778 break; 1779 } 1780 } 1781 if (flags & MSG_EOR) 1782 break; 1783 /* 1784 * If the MSG_WAITALL flag is set (for non-atomic socket), we 1785 * must not quit until "uio->uio_resid == 0" or an error 1786 * termination. If a signal/timeout occurs, return with a 1787 * short count but without error. Keep sockbuf locked 1788 * against other readers. 1789 / 1790* while (flags & MSG_WAITALL && m == NULL && uio->uio_resid > 0 && 1791 !sosendallatonce(so) && nextrecord == NULL) { 1792 SOCKBUF_LOCK_ASSERT(&so->so_rcv); 1793 if (so->so_error \|\| so->so_rcv.sb_state & SBS_CANTRCVMORE) 1794 break; 1795 /* 1796 * Notify the protocol that some data has been 1797 * drained before blocking. 1798 / 1799* if (pr->pr_flags & PR_WANTRCVD) { 1800 SOCKBUF_UNLOCK(&so->so_rcv); 1801 (pr->pr_usrreqs->pru_rcvd)(so, flags); 1802* SOCKBUF_LOCK(&so->so_rcv); 1803 } 1804 SBLASTRECORDCHK(&so->so_rcv); 1805 SBLASTMBUFCHK(&so->so_rcv); 1806 error = sbwait(&so->so_rcv); 1807 if (error) { 1808 SOCKBUF_UNLOCK(&so->so_rcv); 1809 goto release; 1810 } 1811 m = so->so_rcv.sb_mb; 1812 if (m != NULL) 1813 nextrecord = m->m_nextpkt; 1814 } 1815 } 1816 1817 SOCKBUF_LOCK_ASSERT(&so->so_rcv); 1818 if (m != NULL && pr->pr_flags & PR_ATOMIC) { 1819 flags \|= MSG_TRUNC; 1820 if ((flags & MSG_PEEK) == 0) 1821 (void) sbdroprecord_locked(&so->so_rcv); 1822 } 1823 if ((flags & MSG_PEEK) == 0) { 1824 if (m == NULL) { 1825 /* 1826 * First part is an inline SB_EMPTY_FIXUP(). Second 1827 * part makes sure sb_lastrecord is up-to-date if 1828 * there is still data in the socket buffer. 1829 / 1830* so->so_rcv.sb_mb = nextrecord; 1831 if (so->so_rcv.sb_mb == NULL) { 1832 so->so_rcv.sb_mbtail = NULL; 1833 so->so_rcv.sb_lastrecord = NULL; 1834 } else if (nextrecord->m_nextpkt == NULL) 1835 so->so_rcv.sb_lastrecord = nextrecord; 1836 } 1837 SBLASTRECORDCHK(&so->so_rcv); 1838 SBLASTMBUFCHK(&so->so_rcv); 1839 /* 1840 * If soreceive() is being done from the socket callback, 1841 * then don't need to generate ACK to peer to update window, 1842 * since ACK will be generated on return to TCP. 1843 / 1844* if (!(flags & MSG_SOCALLBCK) && 1845 (pr->pr_flags & PR_WANTRCVD)) { 1846 SOCKBUF_UNLOCK(&so->so_rcv); 1847 (pr->pr_usrreqs->pru_rcvd)(so, flags); 1848* SOCKBUF_LOCK(&so->so_rcv); 1849 } 1850 } 1851 SOCKBUF_LOCK_ASSERT(&so->so_rcv); 1852 if (orig_resid == uio->uio_resid && orig_resid && 1853 (flags & MSG_EOR) == 0 && (so->so_rcv.sb_state & SBS_CANTRCVMORE) == 0) { 1854 SOCKBUF_UNLOCK(&so->so_rcv); 1855 goto restart; 1856 } 1857 SOCKBUF_UNLOCK(&so->so_rcv); 1858 1859 if (flagsp != NULL) 1860 flagsp \|= flags; 1861release: 1862* sbunlock(&so->so_rcv); 1863 return (error); 1864} 1865 1866/* 1867 * Optimized version of soreceive() for simple datagram cases from userspace. 1868 * Unlike in the stream case, we're able to drop a datagram if copyout() 1869 * fails, and because we handle datagrams atomically, we don't need to use a 1870 * sleep lock to prevent I/O interlacing. 1871 / 1872int 1873soreceive_dgram(struct socket so, struct sockaddr *psa, struct uio uio, 1874 struct mbuf mp0, struct mbuf controlp, int flagsp) 1875{ 1876* struct mbuf m, m2; 1877 int flags, len, error; 1878 struct protosw pr = so->so_proto; 1879* struct mbuf nextrecord; 1880* 1881 if (psa != NULL) 1882 psa = NULL; 1883* if (controlp != NULL) 1884 controlp = NULL; 1885* if (flagsp != NULL) 1886 flags = flagsp &~ MSG_EOR; 1887* else 1888 flags = 0; 1889 1890 /* 1891 * For any complicated cases, fall back to the full 1892 * soreceive_generic(). 1893 / 1894* if (mp0 != NULL \|\| (flags & MSG_PEEK) \|\| (flags & MSG_OOB)) 1895 return (soreceive_generic(so, psa, uio, mp0, controlp, 1896 flagsp)); 1897 1898 /* 1899 * Enforce restrictions on use. 1900 / 1901* KASSERT((pr->pr_flags & PR_WANTRCVD) == 0, 1902 ("soreceive_dgram: wantrcvd")); 1903 KASSERT(pr->pr_flags & PR_ATOMIC, ("soreceive_dgram: !atomic")); 1904 KASSERT((so->so_rcv.sb_state & SBS_RCVATMARK) == 0, 1905 ("soreceive_dgram: SBS_RCVATMARK")); 1906 KASSERT((so->so_proto->pr_flags & PR_CONNREQUIRED) == 0, 1907 ("soreceive_dgram: P_CONNREQUIRED")); 1908 1909 /* 1910 * Loop blocking while waiting for a datagram. 1911 / 1912* SOCKBUF_LOCK(&so->so_rcv); 1913 while ((m = so->so_rcv.sb_mb) == NULL) { 1914 KASSERT(so->so_rcv.sb_cc == 0, 1915 ("soreceive_dgram: sb_mb NULL but sb_cc %u", 1916 so->so_rcv.sb_cc)); 1917 if (so->so_error) { 1918 error = so->so_error; 1919 so->so_error = 0; 1920 SOCKBUF_UNLOCK(&so->so_rcv); 1921 return (error); 1922 } 1923 if (so->so_rcv.sb_state & SBS_CANTRCVMORE \|\| 1924 uio->uio_resid == 0) { 1925 SOCKBUF_UNLOCK(&so->so_rcv); 1926 return (0); 1927 } 1928 if ((so->so_state & SS_NBIO) \|\| 1929 (flags & (MSG_DONTWAIT\|MSG_NBIO))) { 1930 SOCKBUF_UNLOCK(&so->so_rcv); 1931 return (EWOULDBLOCK); 1932 } 1933 SBLASTRECORDCHK(&so->so_rcv); 1934 SBLASTMBUFCHK(&so->so_rcv); 1935 error = sbwait(&so->so_rcv); 1936 if (error) { 1937 SOCKBUF_UNLOCK(&so->so_rcv); 1938 return (error); 1939 } 1940 } 1941 SOCKBUF_LOCK_ASSERT(&so->so_rcv); 1942 1943 if (uio->uio_td) 1944 uio->uio_td->td_ru.ru_msgrcv++; 1945 SBLASTRECORDCHK(&so->so_rcv); 1946 SBLASTMBUFCHK(&so->so_rcv); 1947 nextrecord = m->m_nextpkt; 1948 if (nextrecord == NULL) { 1949 KASSERT(so->so_rcv.sb_lastrecord == m, 1950 ("soreceive_dgram: lastrecord != m")); 1951 } 1952 1953 KASSERT(so->so_rcv.sb_mb->m_nextpkt == nextrecord, 1954 ("soreceive_dgram: m_nextpkt != nextrecord")); 1955 1956 /* 1957 * Pull 'm' and its chain off the front of the packet queue. 1958 / 1959* so->so_rcv.sb_mb = NULL; 1960 sockbuf_pushsync(&so->so_rcv, nextrecord); 1961 1962 /* 1963 * Walk 'm's chain and free that many bytes from the socket buffer. 1964 / 1965* for (m2 = m; m2 != NULL; m2 = m2->m_next) 1966 sbfree(&so->so_rcv, m2); 1967 1968 /* 1969 * Do a few last checks before we let go of the lock. 1970 / 1971* SBLASTRECORDCHK(&so->so_rcv); 1972 SBLASTMBUFCHK(&so->so_rcv); 1973 SOCKBUF_UNLOCK(&so->so_rcv); 1974 1975 if (pr->pr_flags & PR_ADDR) { 1976 KASSERT(m->m_type == MT_SONAME, 1977 ("m->m_type == %d", m->m_type)); 1978 if (psa != NULL) 1979 psa = sodupsockaddr(mtod(m, struct sockaddr ), 1980 M_NOWAIT); 1981 m = m_free(m); 1982 } 1983 if (m == NULL) { 1984 /* XXXRW: Can this happen? / 1985* return (0); 1986 } 1987 1988 /* 1989 * Packet to copyout() is now in 'm' and it is disconnected from the 1990 * queue. 1991 * 1992 * Process one or more MT_CONTROL mbufs present before any data mbufs 1993 * in the first mbuf chain on the socket buffer. We call into the 1994 * protocol to perform externalization (or freeing if controlp == 1995 * NULL). 1996 / 1997* if (m->m_type == MT_CONTROL) { 1998 struct mbuf cm = NULL, cmn; 1999 struct mbuf *cme = &cm; 2000* 2001 do { 2002 m2 = m->m_next; 2003 m->m_next = NULL; 2004 cme = m; 2005* cme = &(cme)->m_next; 2006* m = m2; 2007 } while (m != NULL && m->m_type == MT_CONTROL); 2008 while (cm != NULL) { 2009 cmn = cm->m_next; 2010 cm->m_next = NULL; 2011 if (pr->pr_domain->dom_externalize != NULL) { 2012 error = (pr->pr_domain->dom_externalize) 2013* (cm, controlp); 2014 } else if (controlp != NULL) 2015 controlp = cm; 2016* else 2017 m_freem(cm); 2018 if (controlp != NULL) { 2019 while (controlp != NULL) 2020* controlp = &(controlp)->m_next; 2021* } 2022 cm = cmn; 2023 } 2024 } 2025 KASSERT(m->m_type == MT_DATA, ("soreceive_dgram: !data")); 2026 2027 while (m != NULL && uio->uio_resid > 0) { 2028 len = uio->uio_resid; 2029 if (len > m->m_len) 2030 len = m->m_len; 2031 error = uiomove(mtod(m, char ), (int)len, uio); 2032* if (error) { 2033 m_freem(m); 2034 return (error); 2035 } 2036 m = m_free(m); 2037 } 2038 if (m != NULL) 2039 flags \|= MSG_TRUNC; 2040 m_freem(m); 2041 if (flagsp != NULL) 2042 flagsp \|= flags; 2043* return (0); 2044} 2045 2046int 2047soreceive(struct socket so, struct sockaddr psa, struct uio uio, 2048 struct mbuf mp0, struct mbuf controlp, int flagsp) 2049{ 2050* 2051 return (so->so_proto->pr_usrreqs->pru_soreceive(so, psa, uio, mp0, 2052 controlp, flagsp)); 2053} 2054 2055int 2056soshutdown(struct socket so, int how) 2057{ 2058* struct protosw pr = so->so_proto; 2059* int error; 2060 2061 if (!(how == SHUT_RD \|\| how == SHUT_WR \|\| how == SHUT_RDWR)) 2062 return (EINVAL); 2063 if (pr->pr_usrreqs->pru_flush != NULL) { 2064 (pr->pr_usrreqs->pru_flush)(so, how); 2065* } 2066 if (how != SHUT_WR) 2067 sorflush(so); 2068 if (how != SHUT_RD) { 2069 CURVNET_SET(so->so_vnet); 2070 error = (pr->pr_usrreqs->pru_shutdown)(so); 2071* CURVNET_RESTORE(); 2072 return (error); 2073 } 2074 return (0); 2075} 2076 2077void 2078sorflush(struct socket so) 2079{ 2080* struct sockbuf sb = &so->so_rcv; 2081* struct protosw pr = so->so_proto; 2082* struct sockbuf asb; 2083 2084 /* 2085 * In order to avoid calling dom_dispose with the socket buffer mutex 2086 * held, and in order to generally avoid holding the lock for a long 2087 * time, we make a copy of the socket buffer and clear the original 2088 * (except locks, state). The new socket buffer copy won't have 2089 * initialized locks so we can only call routines that won't use or 2090 * assert those locks. 2091 * 2092 * Dislodge threads currently blocked in receive and wait to acquire 2093 * a lock against other simultaneous readers before clearing the 2094 * socket buffer. Don't let our acquire be interrupted by a signal 2095 * despite any existing socket disposition on interruptable waiting. 2096 / 2097* CURVNET_SET(so->so_vnet); 2098 socantrcvmore(so); 2099 (void) sblock(sb, SBL_WAIT \| SBL_NOINTR); 2100 2101 /* 2102 * Invalidate/clear most of the sockbuf structure, but leave selinfo 2103 * and mutex data unchanged. 2104 / 2105* SOCKBUF_LOCK(sb); 2106 bzero(&asb, offsetof(struct sockbuf, sb_startzero)); 2107 bcopy(&sb->sb_startzero, &asb.sb_startzero, 2108 sizeof(sb) - offsetof(struct sockbuf, sb_startzero)); 2109* bzero(&sb->sb_startzero, 2110 sizeof(sb) - offsetof(struct sockbuf, sb_startzero)); 2111* SOCKBUF_UNLOCK(sb); 2112 sbunlock(sb); 2113 2114 /* 2115 * Dispose of special rights and flush the socket buffer. Don't call 2116 * any unsafe routines (that rely on locks being initialized) on asb. 2117 / 2118* if (pr->pr_flags & PR_RIGHTS && pr->pr_domain->dom_dispose != NULL) 2119 (pr->pr_domain->dom_dispose)(asb.sb_mb); 2120* sbrelease_internal(&asb, so); 2121 CURVNET_RESTORE(); 2122} 2123 2124/* 2125 * Perhaps this routine, and sooptcopyout(), below, ought to come in an 2126 * additional variant to handle the case where the option value needs to be 2127 * some kind of integer, but not a specific size. In addition to their use 2128 * here, these functions are also called by the protocol-level pr_ctloutput() 2129 * routines. 2130 / 2131int 2132sooptcopyin(struct sockopt sopt, void buf, size_t len, size_t minlen) 2133{ 2134* size_t valsize; 2135 2136 /* 2137 * If the user gives us more than we wanted, we ignore it, but if we 2138 * don't get the minimum length the caller wants, we return EINVAL. 2139 * On success, sopt->sopt_valsize is set to however much we actually 2140 * retrieved. 2141 / 2142* if ((valsize = sopt->sopt_valsize) < minlen) 2143 return EINVAL; 2144 if (valsize > len) 2145 sopt->sopt_valsize = valsize = len; 2146 2147 if (sopt->sopt_td != NULL) 2148 return (copyin(sopt->sopt_val, buf, valsize)); 2149 2150 bcopy(sopt->sopt_val, buf, valsize); 2151 return (0); 2152} 2153 2154/* 2155 * Kernel version of setsockopt(2). 2156 * 2157 * XXX: optlen is size_t, not socklen_t 2158 / 2159int 2160so_setsockopt(struct socket so, int level, int optname, void optval, 2161* size_t optlen) 2162{ 2163 struct sockopt sopt; 2164 2165 sopt.sopt_level = level; 2166 sopt.sopt_name = optname; 2167 sopt.sopt_dir = SOPT_SET; 2168 sopt.sopt_val = optval; 2169 sopt.sopt_valsize = optlen; 2170 sopt.sopt_td = NULL; 2171 return (sosetopt(so, &sopt)); 2172} 2173 2174int 2175sosetopt(struct socket so, struct sockopt sopt) 2176{ 2177 int error, optval; 2178 struct linger l; 2179 struct timeval tv; 2180 u_long val; 2181#ifdef MAC 2182 struct mac extmac; 2183#endif 2184 2185 error = 0; 2186 if (sopt->sopt_level != SOL_SOCKET) { 2187 if (so->so_proto && so->so_proto->pr_ctloutput) 2188 return ((so->so_proto->pr_ctloutput) 2189* (so, sopt)); 2190 error = ENOPROTOOPT; 2191 } else { 2192 switch (sopt->sopt_name) { 2193#ifdef INET 2194 case SO_ACCEPTFILTER: 2195 error = do_setopt_accept_filter(so, sopt); 2196 if (error) 2197 goto bad; 2198 break; 2199#endif 2200 case SO_LINGER: 2201 error = sooptcopyin(sopt, &l, sizeof l, sizeof l); 2202 if (error) 2203 goto bad; 2204 2205 SOCK_LOCK(so); 2206 so->so_linger = l.l_linger; 2207 if (l.l_onoff) 2208 so->so_options \|= SO_LINGER; 2209 else 2210 so->so_options &= ~SO_LINGER; 2211 SOCK_UNLOCK(so); 2212 break; 2213 2214 case SO_DEBUG: 2215 case SO_KEEPALIVE: 2216 case SO_DONTROUTE: 2217 case SO_USELOOPBACK: 2218 case SO_BROADCAST: 2219 case SO_REUSEADDR: 2220 case SO_REUSEPORT: 2221 case SO_OOBINLINE: 2222 case SO_TIMESTAMP: 2223 case SO_BINTIME: 2224 case SO_NOSIGPIPE: 2225 case SO_NO_DDP: 2226 case SO_NO_OFFLOAD: 2227 error = sooptcopyin(sopt, &optval, sizeof optval, 2228 sizeof optval); 2229 if (error) 2230 goto bad; 2231 SOCK_LOCK(so); 2232 if (optval) 2233 so->so_options \|= sopt->sopt_name; 2234 else 2235 so->so_options &= ~sopt->sopt_name; 2236 SOCK_UNLOCK(so); 2237 break; 2238 2239 case SO_SETFIB: 2240 error = sooptcopyin(sopt, &optval, sizeof optval, 2241 sizeof optval); 2242 if (optval < 1 \|\| optval > rt_numfibs) { 2243 error = EINVAL; 2244 goto bad; 2245 } 2246 if ((so->so_proto->pr_domain->dom_family == PF_INET) \|\| 2247 (so->so_proto->pr_domain->dom_family == PF_ROUTE)) { 2248 so->so_fibnum = optval; 2249 /* Note: ignore error / 2250* if (so->so_proto && so->so_proto->pr_ctloutput) 2251 (so->so_proto->pr_ctloutput)(so, sopt); 2252* } else { 2253 so->so_fibnum = 0; 2254 } 2255 break; 2256 case SO_SNDBUF: 2257 case SO_RCVBUF: 2258 case SO_SNDLOWAT: 2259 case SO_RCVLOWAT: 2260 error = sooptcopyin(sopt, &optval, sizeof optval, 2261 sizeof optval); 2262 if (error) 2263 goto bad; 2264 2265 /* 2266 * Values < 1 make no sense for any of these options, 2267 * so disallow them. 2268 / 2269* if (optval < 1) { 2270 error = EINVAL; 2271 goto bad; 2272 } 2273 2274 switch (sopt->sopt_name) { 2275 case SO_SNDBUF: 2276 case SO_RCVBUF: 2277 if (sbreserve(sopt->sopt_name == SO_SNDBUF ? 2278 &so->so_snd : &so->so_rcv, (u_long)optval, 2279 so, curthread) == 0) { 2280 error = ENOBUFS; 2281 goto bad; 2282 } 2283 (sopt->sopt_name == SO_SNDBUF ? &so->so_snd : 2284 &so->so_rcv)->sb_flags &= ~SB_AUTOSIZE; 2285 break; 2286 2287 /* 2288 * Make sure the low-water is never greater than the 2289 * high-water. 2290 / 2291* case SO_SNDLOWAT: 2292 SOCKBUF_LOCK(&so->so_snd); 2293 so->so_snd.sb_lowat = 2294 (optval > so->so_snd.sb_hiwat) ? 2295 so->so_snd.sb_hiwat : optval; 2296 SOCKBUF_UNLOCK(&so->so_snd); 2297 break; 2298 case SO_RCVLOWAT: 2299 SOCKBUF_LOCK(&so->so_rcv); 2300 so->so_rcv.sb_lowat = 2301 (optval > so->so_rcv.sb_hiwat) ? 2302 so->so_rcv.sb_hiwat : optval; 2303 SOCKBUF_UNLOCK(&so->so_rcv); 2304 break; 2305 } 2306 break; 2307 2308 case SO_SNDTIMEO: 2309 case SO_RCVTIMEO: 2310#ifdef COMPAT_IA32 2311 if (SV_CURPROC_FLAG(SV_ILP32)) { 2312 struct timeval32 tv32; 2313 2314 error = sooptcopyin(sopt, &tv32, sizeof tv32, 2315 sizeof tv32); 2316 CP(tv32, tv, tv_sec); 2317 CP(tv32, tv, tv_usec); 2318 } else 2319#endif 2320 error = sooptcopyin(sopt, &tv, sizeof tv, 2321 sizeof tv); 2322 if (error) 2323 goto bad; 2324 2325 /* assert(hz > 0); / 2326* if (tv.tv_sec < 0 \|\| tv.tv_sec > INT_MAX / hz \|\| 2327 tv.tv_usec < 0 \|\| tv.tv_usec >= 1000000) { 2328 error = EDOM; 2329 goto bad; 2330 } 2331 /* assert(tick > 0); / 2332* /* assert(ULONG_MAX - INT_MAX >= 1000000); / 2333* val = (u_long)(tv.tv_sec * hz) + tv.tv_usec / tick; 2334 if (val > INT_MAX) { 2335 error = EDOM; 2336 goto bad; 2337 } 2338 if (val == 0 && tv.tv_usec != 0) 2339 val = 1; 2340 2341 switch (sopt->sopt_name) { 2342 case SO_SNDTIMEO: 2343 so->so_snd.sb_timeo = val; 2344 break; 2345 case SO_RCVTIMEO: 2346 so->so_rcv.sb_timeo = val; 2347 break; 2348 } 2349 break; 2350 2351 case SO_LABEL: 2352#ifdef MAC 2353 error = sooptcopyin(sopt, &extmac, sizeof extmac, 2354 sizeof extmac); 2355 if (error) 2356 goto bad; 2357 error = mac_setsockopt_label(sopt->sopt_td->td_ucred, 2358 so, &extmac); 2359#else 2360 error = EOPNOTSUPP; 2361#endif 2362 break; 2363 2364 default: 2365 error = ENOPROTOOPT; 2366 break; 2367 } 2368 if (error == 0 && so->so_proto != NULL && 2369 so->so_proto->pr_ctloutput != NULL) { 2370 (void) ((so->so_proto->pr_ctloutput) 2371* (so, sopt)); 2372 } 2373 } 2374bad: 2375 return (error); 2376} 2377 2378/* 2379 * Helper routine for getsockopt. 2380 / 2381int 2382sooptcopyout(struct sockopt sopt, const void buf, size_t len) 2383{ 2384* int error; 2385 size_t valsize; 2386 2387 error = 0; 2388 2389 /* 2390 * Documented get behavior is that we always return a value, possibly 2391 * truncated to fit in the user's buffer. Traditional behavior is 2392 * that we always tell the user precisely how much we copied, rather 2393 * than something useful like the total amount we had available for 2394 * her. Note that this interface is not idempotent; the entire 2395 * answer must generated ahead of time. 2396 / 2397* valsize = min(len, sopt->sopt_valsize); 2398 sopt->sopt_valsize = valsize; 2399 if (sopt->sopt_val != NULL) { 2400 if (sopt->sopt_td != NULL) 2401 error = copyout(buf, sopt->sopt_val, valsize); 2402 else 2403 bcopy(buf, sopt->sopt_val, valsize); 2404 } 2405 return (error); 2406} 2407 2408int 2409sogetopt(struct socket so, struct sockopt sopt) 2410{ 2411 int error, optval; 2412 struct linger l; 2413 struct timeval tv; 2414#ifdef MAC 2415 struct mac extmac; 2416#endif 2417 2418 error = 0; 2419 if (sopt->sopt_level != SOL_SOCKET) { 2420 if (so->so_proto && so->so_proto->pr_ctloutput) { 2421 return ((so->so_proto->pr_ctloutput) 2422* (so, sopt)); 2423 } else 2424 return (ENOPROTOOPT); 2425 } else { 2426 switch (sopt->sopt_name) { 2427#ifdef INET 2428 case SO_ACCEPTFILTER: 2429 error = do_getopt_accept_filter(so, sopt); 2430 break; 2431#endif 2432 case SO_LINGER: 2433 SOCK_LOCK(so); 2434 l.l_onoff = so->so_options & SO_LINGER; 2435 l.l_linger = so->so_linger; 2436 SOCK_UNLOCK(so); 2437 error = sooptcopyout(sopt, &l, sizeof l); 2438 break; 2439 2440 case SO_USELOOPBACK: 2441 case SO_DONTROUTE: 2442 case SO_DEBUG: 2443 case SO_KEEPALIVE: 2444 case SO_REUSEADDR: 2445 case SO_REUSEPORT: 2446 case SO_BROADCAST: 2447 case SO_OOBINLINE: 2448 case SO_ACCEPTCONN: 2449 case SO_TIMESTAMP: 2450 case SO_BINTIME: 2451 case SO_NOSIGPIPE: 2452 optval = so->so_options & sopt->sopt_name; 2453integer: 2454 error = sooptcopyout(sopt, &optval, sizeof optval); 2455 break; 2456 2457 case SO_TYPE: 2458 optval = so->so_type; 2459 goto integer; 2460 2461 case SO_ERROR: 2462 SOCK_LOCK(so); 2463 optval = so->so_error; 2464 so->so_error = 0; 2465 SOCK_UNLOCK(so); 2466 goto integer; 2467 2468 case SO_SNDBUF: 2469 optval = so->so_snd.sb_hiwat; 2470 goto integer; 2471 2472 case SO_RCVBUF: 2473 optval = so->so_rcv.sb_hiwat; 2474 goto integer; 2475 2476 case SO_SNDLOWAT: 2477 optval = so->so_snd.sb_lowat; 2478 goto integer; 2479 2480 case SO_RCVLOWAT: 2481 optval = so->so_rcv.sb_lowat; 2482 goto integer; 2483 2484 case SO_SNDTIMEO: 2485 case SO_RCVTIMEO: 2486 optval = (sopt->sopt_name == SO_SNDTIMEO ? 2487 so->so_snd.sb_timeo : so->so_rcv.sb_timeo); 2488 2489 tv.tv_sec = optval / hz; 2490 tv.tv_usec = (optval % hz) * tick; 2491#ifdef COMPAT_IA32 2492 if (SV_CURPROC_FLAG(SV_ILP32)) { 2493 struct timeval32 tv32; 2494 2495 CP(tv, tv32, tv_sec); 2496 CP(tv, tv32, tv_usec); 2497 error = sooptcopyout(sopt, &tv32, sizeof tv32); 2498 } else 2499#endif 2500 error = sooptcopyout(sopt, &tv, sizeof tv); 2501 break; 2502 2503 case SO_LABEL: 2504#ifdef MAC 2505 error = sooptcopyin(sopt, &extmac, sizeof(extmac), 2506 sizeof(extmac)); 2507 if (error) 2508 return (error); 2509 error = mac_getsockopt_label(sopt->sopt_td->td_ucred, 2510 so, &extmac); 2511 if (error) 2512 return (error); 2513 error = sooptcopyout(sopt, &extmac, sizeof extmac); 2514#else 2515 error = EOPNOTSUPP; 2516#endif 2517 break; 2518 2519 case SO_PEERLABEL: 2520#ifdef MAC 2521 error = sooptcopyin(sopt, &extmac, sizeof(extmac), 2522 sizeof(extmac)); 2523 if (error) 2524 return (error); 2525 error = mac_getsockopt_peerlabel( 2526 sopt->sopt_td->td_ucred, so, &extmac); 2527 if (error) 2528 return (error); 2529 error = sooptcopyout(sopt, &extmac, sizeof extmac); 2530#else 2531 error = EOPNOTSUPP; 2532#endif 2533 break; 2534 2535 case SO_LISTENQLIMIT: 2536 optval = so->so_qlimit; 2537 goto integer; 2538 2539 case SO_LISTENQLEN: 2540 optval = so->so_qlen; 2541 goto integer; 2542 2543 case SO_LISTENINCQLEN: 2544 optval = so->so_incqlen; 2545 goto integer; 2546 2547 default: 2548 error = ENOPROTOOPT; 2549 break; 2550 } 2551 return (error); 2552 } 2553} 2554 2555/* XXX; prepare mbuf for (__FreeBSD__ < 3) routines. / 2556int 2557soopt_getm(struct sockopt sopt, struct mbuf *mp) 2558{ 2559* struct mbuf m, m_prev; 2560 int sopt_size = sopt->sopt_valsize; 2561 2562 MGET(m, sopt->sopt_td ? M_WAIT : M_DONTWAIT, MT_DATA); 2563 if (m == NULL) 2564 return ENOBUFS; 2565 if (sopt_size > MLEN) { 2566 MCLGET(m, sopt->sopt_td ? M_WAIT : M_DONTWAIT); 2567 if ((m->m_flags & M_EXT) == 0) { 2568 m_free(m); 2569 return ENOBUFS; 2570 } 2571 m->m_len = min(MCLBYTES, sopt_size); 2572 } else { 2573 m->m_len = min(MLEN, sopt_size); 2574 } 2575 sopt_size -= m->m_len; 2576 mp = m; 2577* m_prev = m; 2578 2579 while (sopt_size) { 2580 MGET(m, sopt->sopt_td ? M_WAIT : M_DONTWAIT, MT_DATA); 2581 if (m == NULL) { 2582 m_freem(mp); 2583* return ENOBUFS; 2584 } 2585 if (sopt_size > MLEN) { 2586 MCLGET(m, sopt->sopt_td != NULL ? M_WAIT : 2587 M_DONTWAIT); 2588 if ((m->m_flags & M_EXT) == 0) { 2589 m_freem(m); 2590 m_freem(mp); 2591* return ENOBUFS; 2592 } 2593 m->m_len = min(MCLBYTES, sopt_size); 2594 } else { 2595 m->m_len = min(MLEN, sopt_size); 2596 } 2597 sopt_size -= m->m_len; 2598 m_prev->m_next = m; 2599 m_prev = m; 2600 } 2601 return (0); 2602} 2603 2604/* XXX; copyin sopt data into mbuf chain for (__FreeBSD__ < 3) routines. / 2605int 2606soopt_mcopyin(struct sockopt sopt, struct mbuf m) 2607{ 2608* struct mbuf m0 = m; 2609* 2610 if (sopt->sopt_val == NULL) 2611 return (0); 2612 while (m != NULL && sopt->sopt_valsize >= m->m_len) { 2613 if (sopt->sopt_td != NULL) { 2614 int error; 2615 2616 error = copyin(sopt->sopt_val, mtod(m, char ), 2617* m->m_len); 2618 if (error != 0) { 2619 m_freem(m0); 2620 return(error); 2621 } 2622 } else 2623 bcopy(sopt->sopt_val, mtod(m, char ), m->m_len); 2624* sopt->sopt_valsize -= m->m_len; 2625 sopt->sopt_val = (char )sopt->sopt_val + m->m_len; 2626* m = m->m_next; 2627 } 2628 if (m != NULL) /* should be allocated enoughly at ip6_sooptmcopyin() / 2629* panic("ip6_sooptmcopyin"); 2630 return (0); 2631} 2632 2633/* XXX; copyout mbuf chain data into soopt for (__FreeBSD__ < 3) routines. / 2634int 2635soopt_mcopyout(struct sockopt sopt, struct mbuf m) 2636{ 2637* struct mbuf m0 = m; 2638* size_t valsize = 0; 2639 2640 if (sopt->sopt_val == NULL) 2641 return (0); 2642 while (m != NULL && sopt->sopt_valsize >= m->m_len) { 2643 if (sopt->sopt_td != NULL) { 2644 int error; 2645 2646 error = copyout(mtod(m, char ), sopt->sopt_val, 2647* m->m_len); 2648 if (error != 0) { 2649 m_freem(m0); 2650 return(error); 2651 } 2652 } else 2653 bcopy(mtod(m, char ), sopt->sopt_val, m->m_len); 2654* sopt->sopt_valsize -= m->m_len; 2655 sopt->sopt_val = (char )sopt->sopt_val + m->m_len; 2656* valsize += m->m_len; 2657 m = m->m_next; 2658 } 2659 if (m != NULL) { 2660 /* enough soopt buffer should be given from user-land / 2661* m_freem(m0); 2662 return(EINVAL); 2663 } 2664 sopt->sopt_valsize = valsize; 2665 return (0); 2666} 2667 2668/* 2669 * sohasoutofband(): protocol notifies socket layer of the arrival of new 2670 * out-of-band data, which will then notify socket consumers. 2671 / 2672void 2673sohasoutofband(struct socket so) 2674{ 2675 2676 if (so->so_sigio != NULL) 2677 pgsigio(&so->so_sigio, SIGURG, 0); 2678 selwakeuppri(&so->so_rcv.sb_sel, PSOCK); 2679} 2680 2681int 2682sopoll(struct socket so, int events, struct ucred active_cred, 2683 struct thread td) 2684{ 2685* 2686 return (so->so_proto->pr_usrreqs->pru_sopoll(so, events, active_cred, 2687 td)); 2688} 2689 2690int 2691sopoll_generic(struct socket so, int events, struct ucred active_cred, 2692 struct thread td) 2693{ 2694* int revents = 0; 2695 2696 SOCKBUF_LOCK(&so->so_snd); 2697 SOCKBUF_LOCK(&so->so_rcv); 2698 if (events & (POLLIN \| POLLRDNORM)) 2699 if (soreadable(so)) 2700 revents \|= events & (POLLIN \| POLLRDNORM); 2701 2702 if (events & POLLINIGNEOF) 2703 if (so->so_rcv.sb_cc >= so->so_rcv.sb_lowat \|\| 2704 !TAILQ_EMPTY(&so->so_comp) \|\| so->so_error) 2705 revents \|= POLLINIGNEOF; 2706 2707 if (events & (POLLOUT \| POLLWRNORM)) 2708 if (sowriteable(so)) 2709 revents \|= events & (POLLOUT \| POLLWRNORM); 2710 2711 if (events & (POLLPRI \| POLLRDBAND)) 2712 if (so->so_oobmark \|\| (so->so_rcv.sb_state & SBS_RCVATMARK)) 2713 revents \|= events & (POLLPRI \| POLLRDBAND); 2714 2715 if (revents == 0) { 2716 if (events & 2717 (POLLIN \| POLLINIGNEOF \| POLLPRI \| POLLRDNORM \| 2718 POLLRDBAND)) { 2719 selrecord(td, &so->so_rcv.sb_sel); 2720 so->so_rcv.sb_flags \|= SB_SEL; 2721 } 2722 2723 if (events & (POLLOUT \| POLLWRNORM)) { 2724 selrecord(td, &so->so_snd.sb_sel); 2725 so->so_snd.sb_flags \|= SB_SEL; 2726 } 2727 } 2728 2729 SOCKBUF_UNLOCK(&so->so_rcv); 2730 SOCKBUF_UNLOCK(&so->so_snd); 2731 return (revents); 2732} 2733 2734int 2735soo_kqfilter(struct file fp, struct knote kn) 2736{ 2737 struct socket so = kn->kn_fp->f_data; 2738* struct sockbuf sb; 2739* 2740 switch (kn->kn_filter) { 2741 case EVFILT_READ: 2742 if (so->so_options & SO_ACCEPTCONN) 2743 kn->kn_fop = &solisten_filtops; 2744 else 2745 kn->kn_fop = &soread_filtops; 2746 sb = &so->so_rcv; 2747 break; 2748 case EVFILT_WRITE: 2749 kn->kn_fop = &sowrite_filtops; 2750 sb = &so->so_snd; 2751 break; 2752 default: 2753 return (EINVAL); 2754 } 2755 2756 SOCKBUF_LOCK(sb); 2757 knlist_add(&sb->sb_sel.si_note, kn, 1); 2758 sb->sb_flags \|= SB_KNOTE; 2759 SOCKBUF_UNLOCK(sb); 2760 return (0); 2761} 2762 2763/* 2764 * Some routines that return EOPNOTSUPP for entry points that are not 2765 * supported by a protocol. Fill in as needed. 2766 / 2767int 2768pru_accept_notsupp(struct socket so, struct sockaddr *nam) 2769{ 2770* 2771 return EOPNOTSUPP; 2772} 2773 2774int 2775pru_attach_notsupp(struct socket so, int proto, struct thread td) 2776{ 2777 2778 return EOPNOTSUPP; 2779} 2780 2781int 2782pru_bind_notsupp(struct socket so, struct sockaddr nam, struct thread td) 2783{ 2784* 2785 return EOPNOTSUPP; 2786} 2787 2788int 2789pru_connect_notsupp(struct socket so, struct sockaddr nam, struct thread td) 2790{ 2791* 2792 return EOPNOTSUPP; 2793} 2794 2795int 2796pru_connect2_notsupp(struct socket so1, struct socket so2) 2797{ 2798 2799 return EOPNOTSUPP; 2800} 2801 2802int 2803pru_control_notsupp(struct socket so, u_long cmd, caddr_t data, 2804* struct ifnet ifp, struct thread td) 2805{ 2806 2807 return EOPNOTSUPP; 2808} 2809 2810int 2811pru_disconnect_notsupp(struct socket so) 2812{ 2813* 2814 return EOPNOTSUPP; 2815} 2816 2817int 2818pru_listen_notsupp(struct socket so, int backlog, struct thread td) 2819{ 2820 2821 return EOPNOTSUPP; 2822} 2823 2824int 2825pru_peeraddr_notsupp(struct socket so, struct sockaddr nam) 2826{ 2827* 2828 return EOPNOTSUPP; 2829} 2830 2831int 2832pru_rcvd_notsupp(struct socket so, int flags) 2833{ 2834* 2835 return EOPNOTSUPP; 2836} 2837 2838int 2839pru_rcvoob_notsupp(struct socket so, struct mbuf m, int flags) 2840{ 2841 2842 return EOPNOTSUPP; 2843} 2844 2845int 2846pru_send_notsupp(struct socket so, int flags, struct mbuf m, 2847 struct sockaddr addr, struct mbuf control, struct thread td) 2848{ 2849* 2850 return EOPNOTSUPP; 2851} 2852 2853/* 2854 * This isn't really a ``null'' operation, but it's the default one and 2855 * doesn't do anything destructive. 2856 / 2857int 2858pru_sense_null(struct socket so, struct stat sb) 2859{ 2860* 2861 sb->st_blksize = so->so_snd.sb_hiwat; 2862 return 0; 2863} 2864 2865int 2866pru_shutdown_notsupp(struct socket so) 2867{ 2868* 2869 return EOPNOTSUPP; 2870} 2871 2872int 2873pru_sockaddr_notsupp(struct socket so, struct sockaddr nam) 2874{ 2875* 2876 return EOPNOTSUPP; 2877} 2878 2879int 2880pru_sosend_notsupp(struct socket so, struct sockaddr addr, struct uio uio, 2881* struct mbuf top, struct mbuf control, int flags, struct thread td) 2882{ 2883* 2884 return EOPNOTSUPP; 2885} 2886 2887int 2888pru_soreceive_notsupp(struct socket so, struct sockaddr paddr, 2889* struct uio uio, struct mbuf mp0, struct mbuf controlp, int flagsp) 2890{ 2891 2892 return EOPNOTSUPP; 2893} 2894 2895int 2896pru_sopoll_notsupp(struct socket so, int events, struct ucred cred, 2897 struct thread td) 2898{ 2899* 2900 return EOPNOTSUPP; 2901} 2902 2903static void 2904filt_sordetach(struct knote kn) 2905{ 2906* struct socket so = kn->kn_fp->f_data; 2907* 2908 SOCKBUF_LOCK(&so->so_rcv); 2909 knlist_remove(&so->so_rcv.sb_sel.si_note, kn, 1); 2910 if (knlist_empty(&so->so_rcv.sb_sel.si_note)) 2911 so->so_rcv.sb_flags &= ~SB_KNOTE; 2912 SOCKBUF_UNLOCK(&so->so_rcv); 2913} 2914 2915/ARGSUSED/ 2916static int 2917filt_soread(struct knote kn, long hint) 2918{ 2919* struct socket so; 2920* 2921 so = kn->kn_fp->f_data; 2922 SOCKBUF_LOCK_ASSERT(&so->so_rcv); 2923 2924 kn->kn_data = so->so_rcv.sb_cc - so->so_rcv.sb_ctl; 2925 if (so->so_rcv.sb_state & SBS_CANTRCVMORE) { 2926 kn->kn_flags \|= EV_EOF; 2927 kn->kn_fflags = so->so_error; 2928 return (1); 2929 } else if (so->so_error) /* temporary udp error / 2930* return (1); 2931 else if (kn->kn_sfflags & NOTE_LOWAT) 2932 return (kn->kn_data >= kn->kn_sdata); 2933 else 2934 return (so->so_rcv.sb_cc >= so->so_rcv.sb_lowat); 2935} 2936 2937static void 2938filt_sowdetach(struct knote kn) 2939{ 2940* struct socket so = kn->kn_fp->f_data; 2941* 2942 SOCKBUF_LOCK(&so->so_snd); 2943 knlist_remove(&so->so_snd.sb_sel.si_note, kn, 1); 2944 if (knlist_empty(&so->so_snd.sb_sel.si_note)) 2945 so->so_snd.sb_flags &= ~SB_KNOTE; 2946 SOCKBUF_UNLOCK(&so->so_snd); 2947} 2948 2949/ARGSUSED/ 2950static int 2951filt_sowrite(struct knote kn, long hint) 2952{ 2953* struct socket so; 2954* 2955 so = kn->kn_fp->f_data; 2956 SOCKBUF_LOCK_ASSERT(&so->so_snd); 2957 kn->kn_data = sbspace(&so->so_snd); 2958 if (so->so_snd.sb_state & SBS_CANTSENDMORE) { 2959 kn->kn_flags \|= EV_EOF; 2960 kn->kn_fflags = so->so_error; 2961 return (1); 2962 } else if (so->so_error) /* temporary udp error / 2963* return (1); 2964 else if (((so->so_state & SS_ISCONNECTED) == 0) && 2965 (so->so_proto->pr_flags & PR_CONNREQUIRED)) 2966 return (0); 2967 else if (kn->kn_sfflags & NOTE_LOWAT) 2968 return (kn->kn_data >= kn->kn_sdata); 2969 else 2970 return (kn->kn_data >= so->so_snd.sb_lowat); 2971} 2972 2973/ARGSUSED/ 2974static int 2975filt_solisten(struct knote kn, long hint) 2976{ 2977* struct socket so = kn->kn_fp->f_data; 2978* 2979 kn->kn_data = so->so_qlen; 2980 return (! TAILQ_EMPTY(&so->so_comp)); 2981} 2982 2983int 2984socheckuid(struct socket so, uid_t uid) 2985{ 2986* 2987 if (so == NULL) 2988 return (EPERM); 2989 if (so->so_cred->cr_uid != uid) 2990 return (EPERM); 2991 return (0); 2992} 2993 2994static int 2995sysctl_somaxconn(SYSCTL_HANDLER_ARGS) 2996{ 2997 int error; 2998 int val; 2999 3000 val = somaxconn; 3001 error = sysctl_handle_int(oidp, &val, 0, req); 3002 if (error \|\| !req->newptr ) 3003 return (error); 3004 3005 if (val < 1 \|\| val > USHRT_MAX) 3006 return (EINVAL); 3007 3008 somaxconn = val; 3009 return (0); 3010} 3011 3012/* 3013 * These functions are used by protocols to notify the socket layer (and its 3014 * consumers) of state changes in the sockets driven by protocol-side events. 3015 / 3016* 3017/* 3018 * Procedures to manipulate state flags of socket and do appropriate wakeups. 3019 * 3020 * Normal sequence from the active (originating) side is that 3021 * soisconnecting() is called during processing of connect() call, resulting 3022 * in an eventual call to soisconnected() if/when the connection is 3023 * established. When the connection is torn down soisdisconnecting() is 3024 * called during processing of disconnect() call, and soisdisconnected() is 3025 * called when the connection to the peer is totally severed. The semantics 3026 * of these routines are such that connectionless protocols can call 3027 * soisconnected() and soisdisconnected() only, bypassing the in-progress 3028 * calls when setting up a ``connection'' takes no time. 3029 * 3030 * From the passive side, a socket is created with two queues of sockets: 3031 * so_incomp for connections in progress and so_comp for connections already 3032 * made and awaiting user acceptance. As a protocol is preparing incoming 3033 * connections, it creates a socket structure queued on so_incomp by calling 3034 * sonewconn(). When the connection is established, soisconnected() is 3035 * called, and transfers the socket structure to so_comp, making it available 3036 * to accept(). 3037 * 3038 * If a socket is closed with sockets on either so_incomp or so_comp, these 3039 * sockets are dropped. 3040 * 3041 * If higher-level protocols are implemented in the kernel, the wakeups done 3042 * here will sometimes cause software-interrupt process scheduling. 3043 / 3044void 3045soisconnecting(struct socket so) 3046{ 3047 3048 SOCK_LOCK(so); 3049 so->so_state &= ~(SS_ISCONNECTED\|SS_ISDISCONNECTING); 3050 so->so_state \|= SS_ISCONNECTING; 3051 SOCK_UNLOCK(so); 3052} 3053 3054void 3055soisconnected(struct socket so) 3056*{
3057 struct socket *head;	3057 struct socket head; 3058* int ret;
3058	3059
	3060restart:
3059 ACCEPT_LOCK(); 3060 SOCK_LOCK(so); 3061 so->so_state &= ~(SS_ISCONNECTING\|SS_ISDISCONNECTING\|SS_ISCONFIRMING); 3062 so->so_state \|= SS_ISCONNECTED; 3063 head = so->so_head; 3064 if (head != NULL && (so->so_qstate & SQ_INCOMP)) { 3065 if ((so->so_options & SO_ACCEPTFILTER) == 0) { 3066 SOCK_UNLOCK(so); 3067 TAILQ_REMOVE(&head->so_incomp, so, so_list); 3068 head->so_incqlen--; 3069 so->so_qstate &= ~SQ_INCOMP; 3070 TAILQ_INSERT_TAIL(&head->so_comp, so, so_list); 3071 head->so_qlen++; 3072 so->so_qstate \|= SQ_COMP; 3073 ACCEPT_UNLOCK(); 3074 sorwakeup(head); 3075 wakeup_one(&head->so_timeo); 3076 } else { 3077 ACCEPT_UNLOCK();	3061 ACCEPT_LOCK(); 3062 SOCK_LOCK(so); 3063 so->so_state &= ~(SS_ISCONNECTING\|SS_ISDISCONNECTING\|SS_ISCONFIRMING); 3064 so->so_state \|= SS_ISCONNECTED; 3065 head = so->so_head; 3066 if (head != NULL && (so->so_qstate & SQ_INCOMP)) { 3067 if ((so->so_options & SO_ACCEPTFILTER) == 0) { 3068 SOCK_UNLOCK(so); 3069 TAILQ_REMOVE(&head->so_incomp, so, so_list); 3070 head->so_incqlen--; 3071 so->so_qstate &= ~SQ_INCOMP; 3072 TAILQ_INSERT_TAIL(&head->so_comp, so, so_list); 3073 head->so_qlen++; 3074 so->so_qstate \|= SQ_COMP; 3075 ACCEPT_UNLOCK(); 3076 sorwakeup(head); 3077 wakeup_one(&head->so_timeo); 3078 } else { 3079 ACCEPT_UNLOCK();
3078 so->so_upcall = 3079 head->so_accf->so_accept_filter->accf_callback; 3080 so->so_upcallarg = head->so_accf->so_accept_filter_arg; 3081 so->so_rcv.sb_flags \|= SB_UPCALL;	3080 soupcall_set(so, SO_RCV, 3081 head->so_accf->so_accept_filter->accf_callback, 3082 head->so_accf->so_accept_filter_arg);
3082 so->so_options &= ~SO_ACCEPTFILTER;	3083 so->so_options &= ~SO_ACCEPTFILTER;
	3084 ret = head->so_accf->so_accept_filter->accf_callback(so, 3085 head->so_accf->so_accept_filter_arg, M_DONTWAIT); 3086 if (ret == SU_ISCONNECTED) 3087 soupcall_clear(so, SO_RCV);
3083 SOCK_UNLOCK(so);	3088 SOCK_UNLOCK(so);
3084 so->so_upcall(so, so->so_upcallarg, M_DONTWAIT);	3089 if (ret == SU_ISCONNECTED) 3090 goto restart;
3085 } 3086 return; 3087 } 3088 SOCK_UNLOCK(so); 3089 ACCEPT_UNLOCK(); 3090 wakeup(&so->so_timeo); 3091 sorwakeup(so); 3092 sowwakeup(so); 3093} 3094 3095void 3096soisdisconnecting(struct socket so) 3097{ 3098* 3099 /* 3100 * Note: This code assumes that SOCK_LOCK(so) and 3101 * SOCKBUF_LOCK(&so->so_rcv) are the same. 3102 / 3103* SOCKBUF_LOCK(&so->so_rcv); 3104 so->so_state &= ~SS_ISCONNECTING; 3105 so->so_state \|= SS_ISDISCONNECTING; 3106 so->so_rcv.sb_state \|= SBS_CANTRCVMORE; 3107 sorwakeup_locked(so); 3108 SOCKBUF_LOCK(&so->so_snd); 3109 so->so_snd.sb_state \|= SBS_CANTSENDMORE; 3110 sowwakeup_locked(so); 3111 wakeup(&so->so_timeo); 3112} 3113 3114void 3115soisdisconnected(struct socket so) 3116{ 3117* 3118 /* 3119 * Note: This code assumes that SOCK_LOCK(so) and 3120 * SOCKBUF_LOCK(&so->so_rcv) are the same. 3121 / 3122* SOCKBUF_LOCK(&so->so_rcv); 3123 so->so_state &= ~(SS_ISCONNECTING\|SS_ISCONNECTED\|SS_ISDISCONNECTING); 3124 so->so_state \|= SS_ISDISCONNECTED; 3125 so->so_rcv.sb_state \|= SBS_CANTRCVMORE; 3126 sorwakeup_locked(so); 3127 SOCKBUF_LOCK(&so->so_snd); 3128 so->so_snd.sb_state \|= SBS_CANTSENDMORE; 3129 sbdrop_locked(&so->so_snd, so->so_snd.sb_cc); 3130 sowwakeup_locked(so); 3131 wakeup(&so->so_timeo); 3132} 3133 3134/* 3135 * Make a copy of a sockaddr in a malloced buffer of type M_SONAME. 3136 / 3137struct sockaddr 3138sodupsockaddr(const struct sockaddr sa, int mflags) 3139{ 3140* struct sockaddr sa2; 3141* 3142 sa2 = malloc(sa->sa_len, M_SONAME, mflags); 3143 if (sa2) 3144 bcopy(sa, sa2, sa->sa_len); 3145 return sa2; 3146} 3147 3148/*	3091 } 3092 return; 3093 } 3094 SOCK_UNLOCK(so); 3095 ACCEPT_UNLOCK(); 3096 wakeup(&so->so_timeo); 3097 sorwakeup(so); 3098 sowwakeup(so); 3099} 3100 3101void 3102soisdisconnecting(struct socket so) 3103{ 3104* 3105 /* 3106 * Note: This code assumes that SOCK_LOCK(so) and 3107 * SOCKBUF_LOCK(&so->so_rcv) are the same. 3108 / 3109* SOCKBUF_LOCK(&so->so_rcv); 3110 so->so_state &= ~SS_ISCONNECTING; 3111 so->so_state \|= SS_ISDISCONNECTING; 3112 so->so_rcv.sb_state \|= SBS_CANTRCVMORE; 3113 sorwakeup_locked(so); 3114 SOCKBUF_LOCK(&so->so_snd); 3115 so->so_snd.sb_state \|= SBS_CANTSENDMORE; 3116 sowwakeup_locked(so); 3117 wakeup(&so->so_timeo); 3118} 3119 3120void 3121soisdisconnected(struct socket so) 3122{ 3123* 3124 /* 3125 * Note: This code assumes that SOCK_LOCK(so) and 3126 * SOCKBUF_LOCK(&so->so_rcv) are the same. 3127 / 3128* SOCKBUF_LOCK(&so->so_rcv); 3129 so->so_state &= ~(SS_ISCONNECTING\|SS_ISCONNECTED\|SS_ISDISCONNECTING); 3130 so->so_state \|= SS_ISDISCONNECTED; 3131 so->so_rcv.sb_state \|= SBS_CANTRCVMORE; 3132 sorwakeup_locked(so); 3133 SOCKBUF_LOCK(&so->so_snd); 3134 so->so_snd.sb_state \|= SBS_CANTSENDMORE; 3135 sbdrop_locked(&so->so_snd, so->so_snd.sb_cc); 3136 sowwakeup_locked(so); 3137 wakeup(&so->so_timeo); 3138} 3139 3140/* 3141 * Make a copy of a sockaddr in a malloced buffer of type M_SONAME. 3142 / 3143struct sockaddr 3144sodupsockaddr(const struct sockaddr sa, int mflags) 3145{ 3146* struct sockaddr sa2; 3147* 3148 sa2 = malloc(sa->sa_len, M_SONAME, mflags); 3149 if (sa2) 3150 bcopy(sa, sa2, sa->sa_len); 3151 return sa2; 3152} 3153 3154/*
	3155 * Register per-socket buffer upcalls. 3156 / 3157void 3158soupcall_set(struct socket so, int which, 3159 int (func)(struct socket , void , int), void arg) 3160{ 3161 struct sockbuf sb; 3162* 3163 switch (which) { 3164 case SO_RCV: 3165 sb = &so->so_rcv; 3166 break; 3167 case SO_SND: 3168 sb = &so->so_snd; 3169 break; 3170 default: 3171 panic("soupcall_set: bad which"); 3172 } 3173 SOCKBUF_LOCK_ASSERT(sb); 3174#if 0 3175 /* XXX: accf_http actually wants to do this on purpose. / 3176* KASSERT(sb->sb_upcall == NULL, ("soupcall_set: overwriting upcall")); 3177#endif 3178 sb->sb_upcall = func; 3179 sb->sb_upcallarg = arg; 3180 sb->sb_flags \|= SB_UPCALL; 3181} 3182 3183void 3184soupcall_clear(struct socket so, int which) 3185{ 3186* struct sockbuf sb; 3187* 3188 switch (which) { 3189 case SO_RCV: 3190 sb = &so->so_rcv; 3191 break; 3192 case SO_SND: 3193 sb = &so->so_snd; 3194 break; 3195 default: 3196 panic("soupcall_clear: bad which"); 3197 } 3198 SOCKBUF_LOCK_ASSERT(sb); 3199 KASSERT(sb->sb_upcall != NULL, ("soupcall_clear: no upcall to clear")); 3200 sb->sb_upcall = NULL; 3201 sb->sb_upcallarg = NULL; 3202 sb->sb_flags &= ~SB_UPCALL; 3203} 3204 3205/*
3149 * Create an external-format (``xsocket'') structure using the information in 3150 * the kernel-format socket structure pointed to by so. This is done to 3151 * reduce the spew of irrelevant information over this interface, to isolate 3152 * user code from changes in the kernel structure, and potentially to provide 3153 * information-hiding if we decide that some of this information should be 3154 * hidden from users. 3155 / 3156void 3157sotoxsocket(struct socket so, struct xsocket xso) 3158{ 3159* 3160 xso->xso_len = sizeof xso; 3161* xso->xso_so = so; 3162 xso->so_type = so->so_type; 3163 xso->so_options = so->so_options; 3164 xso->so_linger = so->so_linger; 3165 xso->so_state = so->so_state; 3166 xso->so_pcb = so->so_pcb; 3167 xso->xso_protocol = so->so_proto->pr_protocol; 3168 xso->xso_family = so->so_proto->pr_domain->dom_family; 3169 xso->so_qlen = so->so_qlen; 3170 xso->so_incqlen = so->so_incqlen; 3171 xso->so_qlimit = so->so_qlimit; 3172 xso->so_timeo = so->so_timeo; 3173 xso->so_error = so->so_error; 3174 xso->so_pgid = so->so_sigio ? so->so_sigio->sio_pgid : 0; 3175 xso->so_oobmark = so->so_oobmark; 3176 sbtoxsockbuf(&so->so_snd, &xso->so_snd); 3177 sbtoxsockbuf(&so->so_rcv, &xso->so_rcv); 3178 xso->so_uid = so->so_cred->cr_uid; 3179} 3180 3181 3182/* 3183 * Socket accessor functions to provide external consumers with 3184 * a safe interface to socket state 3185 * 3186 / 3187* 3188void 3189so_listeners_apply_all(struct socket so, void (func)(struct socket , void ), void arg) 3190{ 3191* 3192 TAILQ_FOREACH(so, &so->so_comp, so_list) 3193 func(so, arg); 3194} 3195 3196struct sockbuf * 3197so_sockbuf_rcv(struct socket so) 3198{ 3199* 3200 return (&so->so_rcv); 3201} 3202 3203struct sockbuf * 3204so_sockbuf_snd(struct socket so) 3205{ 3206* 3207 return (&so->so_snd); 3208} 3209 3210int 3211so_state_get(const struct socket so) 3212{ 3213* 3214 return (so->so_state); 3215} 3216 3217void 3218so_state_set(struct socket so, int val) 3219{ 3220* 3221 so->so_state = val; 3222} 3223 3224int 3225so_options_get(const struct socket so) 3226{ 3227* 3228 return (so->so_options); 3229} 3230 3231void 3232so_options_set(struct socket so, int val) 3233{ 3234* 3235 so->so_options = val; 3236} 3237 3238int 3239so_error_get(const struct socket so) 3240{ 3241* 3242 return (so->so_error); 3243} 3244 3245void 3246so_error_set(struct socket so, int val) 3247{ 3248* 3249 so->so_error = val; 3250} 3251 3252int 3253so_linger_get(const struct socket so) 3254{ 3255* 3256 return (so->so_linger); 3257} 3258 3259void 3260so_linger_set(struct socket so, int val) 3261{ 3262* 3263 so->so_linger = val; 3264} 3265 3266struct protosw * 3267so_protosw_get(const struct socket so) 3268{ 3269* 3270 return (so->so_proto); 3271} 3272 3273void 3274so_protosw_set(struct socket so, struct protosw val) 3275{ 3276 3277 so->so_proto = val; 3278} 3279 3280void 3281so_sorwakeup(struct socket so) 3282{ 3283* 3284 sorwakeup(so); 3285} 3286 3287void 3288so_sowwakeup(struct socket so) 3289{ 3290* 3291 sowwakeup(so); 3292} 3293 3294void 3295so_sorwakeup_locked(struct socket so) 3296{ 3297* 3298 sorwakeup_locked(so); 3299} 3300 3301void 3302so_sowwakeup_locked(struct socket so) 3303{ 3304* 3305 sowwakeup_locked(so); 3306} 3307 3308void 3309so_lock(struct socket so) 3310{ 3311* SOCK_LOCK(so); 3312} 3313 3314void 3315so_unlock(struct socket so) 3316{ 3317* SOCK_UNLOCK(so); 3318}	3206 * Create an external-format (``xsocket'') structure using the information in 3207 * the kernel-format socket structure pointed to by so. This is done to 3208 * reduce the spew of irrelevant information over this interface, to isolate 3209 * user code from changes in the kernel structure, and potentially to provide 3210 * information-hiding if we decide that some of this information should be 3211 * hidden from users. 3212 / 3213void 3214sotoxsocket(struct socket so, struct xsocket xso) 3215{ 3216* 3217 xso->xso_len = sizeof xso; 3218* xso->xso_so = so; 3219 xso->so_type = so->so_type; 3220 xso->so_options = so->so_options; 3221 xso->so_linger = so->so_linger; 3222 xso->so_state = so->so_state; 3223 xso->so_pcb = so->so_pcb; 3224 xso->xso_protocol = so->so_proto->pr_protocol; 3225 xso->xso_family = so->so_proto->pr_domain->dom_family; 3226 xso->so_qlen = so->so_qlen; 3227 xso->so_incqlen = so->so_incqlen; 3228 xso->so_qlimit = so->so_qlimit; 3229 xso->so_timeo = so->so_timeo; 3230 xso->so_error = so->so_error; 3231 xso->so_pgid = so->so_sigio ? so->so_sigio->sio_pgid : 0; 3232 xso->so_oobmark = so->so_oobmark; 3233 sbtoxsockbuf(&so->so_snd, &xso->so_snd); 3234 sbtoxsockbuf(&so->so_rcv, &xso->so_rcv); 3235 xso->so_uid = so->so_cred->cr_uid; 3236} 3237 3238 3239/* 3240 * Socket accessor functions to provide external consumers with 3241 * a safe interface to socket state 3242 * 3243 / 3244* 3245void 3246so_listeners_apply_all(struct socket so, void (func)(struct socket , void ), void arg) 3247{ 3248* 3249 TAILQ_FOREACH(so, &so->so_comp, so_list) 3250 func(so, arg); 3251} 3252 3253struct sockbuf * 3254so_sockbuf_rcv(struct socket so) 3255{ 3256* 3257 return (&so->so_rcv); 3258} 3259 3260struct sockbuf * 3261so_sockbuf_snd(struct socket so) 3262{ 3263* 3264 return (&so->so_snd); 3265} 3266 3267int 3268so_state_get(const struct socket so) 3269{ 3270* 3271 return (so->so_state); 3272} 3273 3274void 3275so_state_set(struct socket so, int val) 3276{ 3277* 3278 so->so_state = val; 3279} 3280 3281int 3282so_options_get(const struct socket so) 3283{ 3284* 3285 return (so->so_options); 3286} 3287 3288void 3289so_options_set(struct socket so, int val) 3290{ 3291* 3292 so->so_options = val; 3293} 3294 3295int 3296so_error_get(const struct socket so) 3297{ 3298* 3299 return (so->so_error); 3300} 3301 3302void 3303so_error_set(struct socket so, int val) 3304{ 3305* 3306 so->so_error = val; 3307} 3308 3309int 3310so_linger_get(const struct socket so) 3311{ 3312* 3313 return (so->so_linger); 3314} 3315 3316void 3317so_linger_set(struct socket so, int val) 3318{ 3319* 3320 so->so_linger = val; 3321} 3322 3323struct protosw * 3324so_protosw_get(const struct socket so) 3325{ 3326* 3327 return (so->so_proto); 3328} 3329 3330void 3331so_protosw_set(struct socket so, struct protosw val) 3332{ 3333 3334 so->so_proto = val; 3335} 3336 3337void 3338so_sorwakeup(struct socket so) 3339{ 3340* 3341 sorwakeup(so); 3342} 3343 3344void 3345so_sowwakeup(struct socket so) 3346{ 3347* 3348 sowwakeup(so); 3349} 3350 3351void 3352so_sorwakeup_locked(struct socket so) 3353{ 3354* 3355 sorwakeup_locked(so); 3356} 3357 3358void 3359so_sowwakeup_locked(struct socket so) 3360{ 3361* 3362 sowwakeup_locked(so); 3363} 3364 3365void 3366so_lock(struct socket so) 3367{ 3368* SOCK_LOCK(so); 3369} 3370 3371void 3372so_unlock(struct socket so) 3373{ 3374* SOCK_UNLOCK(so); 3375}