History log of /freebsd-11-stable/sys/sys/socket.h
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 338617 12-Sep-2018 sobomax

MFC r312296 and r323254, which is new a socket option
SO_TS_CLOCK to pick from several different clock sources to
return timestamps when SO_TIMESTAMP is enabled and two
new nanosecond-precision timestamp types. This also fixes
recvmsg32() system call to properly down-convert layout of the
64-bit structures to match what 32-bit app(s) expect.

Bump __FreeBSD_version to indicate presence of a new
functionality.

Differential Revision: https://reviews.freebsd.org/D9171


# 331722 29-Mar-2018 eadler

Revert r330897:

This was intended to be a non-functional change. It wasn't. The commit
message was thus wrong. In addition it broke arm, and merged crypto
related code.

Revert with prejudice.

This revert skips files touched in r316370 since that commit was since
MFCed. This revert also skips files that require $FreeBSD$ property
changes.

Thank you to those who helped me get out of this mess including but not
limited to gonzo, kevans, rgrimes.

Requested by: gjb (re)


# 330897 14-Mar-2018 eadler

Partial merge of the SPDX changes

These changes are incomplete but are making it difficult
to determine what other changes can/should be merged.

No objections from: pfg


# 313567 10-Feb-2017 jhb

MFC 311568,311584,312387:
Set MORETOCOME for AIO write requests on a socket.

311568:
Set MORETOCOME for AIO write requests on a socket.

Add a MSG_MOREOTOCOME message flag. When this flag is set, sosend*
set PRUS_MOREOTOCOME when invoking the protocol send method. The aio
worker tasks for sending on a socket set this flag when there are
additional write jobs waiting on the socket buffer.

311584:
Unbreak lib/libsysdecode after r311568 by decoding MSG_MORETOCOME flag
in msgflags

(Actually, this change excludes MSG_MORETOCOME from being decoded)

312387:
Fix regression from r311568: collision of MSG_NOSIGNAL with MSG_MORETOCOME
lead to delayed send of data sent with sendto(MSG_NOSIGNAL).

Sponsored by: Chelsio Communications


# 302408 07-Jul-2016 gjb

Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle.
Prune svn:mergeinfo from the new branch, as nothing has been merged
here.

Additional commits post-branch will follow.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


/freebsd-11-stable/MAINTAINERS
/freebsd-11-stable/cddl
/freebsd-11-stable/cddl/contrib/opensolaris
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/print
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/zfs
/freebsd-11-stable/cddl/contrib/opensolaris/lib/libzfs
/freebsd-11-stable/contrib/amd
/freebsd-11-stable/contrib/apr
/freebsd-11-stable/contrib/apr-util
/freebsd-11-stable/contrib/atf
/freebsd-11-stable/contrib/binutils
/freebsd-11-stable/contrib/bmake
/freebsd-11-stable/contrib/byacc
/freebsd-11-stable/contrib/bzip2
/freebsd-11-stable/contrib/com_err
/freebsd-11-stable/contrib/compiler-rt
/freebsd-11-stable/contrib/dialog
/freebsd-11-stable/contrib/dma
/freebsd-11-stable/contrib/dtc
/freebsd-11-stable/contrib/ee
/freebsd-11-stable/contrib/elftoolchain
/freebsd-11-stable/contrib/elftoolchain/ar
/freebsd-11-stable/contrib/elftoolchain/brandelf
/freebsd-11-stable/contrib/elftoolchain/elfdump
/freebsd-11-stable/contrib/expat
/freebsd-11-stable/contrib/file
/freebsd-11-stable/contrib/gcc
/freebsd-11-stable/contrib/gcclibs/libgomp
/freebsd-11-stable/contrib/gdb
/freebsd-11-stable/contrib/gdtoa
/freebsd-11-stable/contrib/groff
/freebsd-11-stable/contrib/ipfilter
/freebsd-11-stable/contrib/ldns
/freebsd-11-stable/contrib/ldns-host
/freebsd-11-stable/contrib/less
/freebsd-11-stable/contrib/libarchive
/freebsd-11-stable/contrib/libarchive/cpio
/freebsd-11-stable/contrib/libarchive/libarchive
/freebsd-11-stable/contrib/libarchive/libarchive_fe
/freebsd-11-stable/contrib/libarchive/tar
/freebsd-11-stable/contrib/libc++
/freebsd-11-stable/contrib/libc-vis
/freebsd-11-stable/contrib/libcxxrt
/freebsd-11-stable/contrib/libexecinfo
/freebsd-11-stable/contrib/libpcap
/freebsd-11-stable/contrib/libstdc++
/freebsd-11-stable/contrib/libucl
/freebsd-11-stable/contrib/libxo
/freebsd-11-stable/contrib/llvm
/freebsd-11-stable/contrib/llvm/projects/libunwind
/freebsd-11-stable/contrib/llvm/tools/clang
/freebsd-11-stable/contrib/llvm/tools/lldb
/freebsd-11-stable/contrib/llvm/tools/llvm-dwarfdump
/freebsd-11-stable/contrib/llvm/tools/llvm-lto
/freebsd-11-stable/contrib/mdocml
/freebsd-11-stable/contrib/mtree
/freebsd-11-stable/contrib/ncurses
/freebsd-11-stable/contrib/netcat
/freebsd-11-stable/contrib/ntp
/freebsd-11-stable/contrib/nvi
/freebsd-11-stable/contrib/one-true-awk
/freebsd-11-stable/contrib/openbsm
/freebsd-11-stable/contrib/openpam
/freebsd-11-stable/contrib/openresolv
/freebsd-11-stable/contrib/pf
/freebsd-11-stable/contrib/sendmail
/freebsd-11-stable/contrib/serf
/freebsd-11-stable/contrib/sqlite3
/freebsd-11-stable/contrib/subversion
/freebsd-11-stable/contrib/tcpdump
/freebsd-11-stable/contrib/tcsh
/freebsd-11-stable/contrib/tnftp
/freebsd-11-stable/contrib/top
/freebsd-11-stable/contrib/top/install-sh
/freebsd-11-stable/contrib/tzcode/stdtime
/freebsd-11-stable/contrib/tzcode/zic
/freebsd-11-stable/contrib/tzdata
/freebsd-11-stable/contrib/unbound
/freebsd-11-stable/contrib/vis
/freebsd-11-stable/contrib/wpa
/freebsd-11-stable/contrib/xz
/freebsd-11-stable/crypto/heimdal
/freebsd-11-stable/crypto/openssh
/freebsd-11-stable/crypto/openssl
/freebsd-11-stable/gnu/lib
/freebsd-11-stable/gnu/usr.bin/binutils
/freebsd-11-stable/gnu/usr.bin/cc/cc_tools
/freebsd-11-stable/gnu/usr.bin/gdb
/freebsd-11-stable/lib/libc/locale/ascii.c
/freebsd-11-stable/sys/cddl/contrib/opensolaris
/freebsd-11-stable/sys/contrib/dev/acpica
/freebsd-11-stable/sys/contrib/ipfilter
/freebsd-11-stable/sys/contrib/libfdt
/freebsd-11-stable/sys/contrib/octeon-sdk
/freebsd-11-stable/sys/contrib/x86emu
/freebsd-11-stable/sys/contrib/xz-embedded
/freebsd-11-stable/usr.sbin/bhyve/atkbdc.h
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.c
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.h
/freebsd-11-stable/usr.sbin/bhyve/console.c
/freebsd-11-stable/usr.sbin/bhyve/console.h
/freebsd-11-stable/usr.sbin/bhyve/pci_fbuf.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.h
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.c
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.h
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.c
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.h
/freebsd-11-stable/usr.sbin/bhyve/rfb.c
/freebsd-11-stable/usr.sbin/bhyve/rfb.h
/freebsd-11-stable/usr.sbin/bhyve/sockstream.c
/freebsd-11-stable/usr.sbin/bhyve/sockstream.h
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.c
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.h
/freebsd-11-stable/usr.sbin/bhyve/usb_mouse.c
/freebsd-11-stable/usr.sbin/bhyve/vga.c
/freebsd-11-stable/usr.sbin/bhyve/vga.h
# 301038 31-May-2016 ed

Make CMSG_*() work without having NULL available.

The <sys/socket.h> is not supposed to declare NULL, according to POSIX.
Our implementation complies with that, meaning that we need to make sure
that CMSG_*() doesn't use it.


# 297400 29-Mar-2016 glebius

The sendfile(2) allows to send extra data from userspace before the file
data (headers). Historically the size of the headers was not checked
against the socket buffer space. Application could easily overcommit the
socket buffer space.

With the new sendfile (r293439) the problem remained, but a KASSERT was
inserted that checked that amount of data written to the socket matches
its space. In case when size of headers is bigger that socket space,
KASSERT fires. Without INVARIANTS the new sendfile won't panic, but
would report incorrect amount of bytes sent.

o With this change, the headers copyin is moved down into the cycle, after
the sbspace() check. The uio size is trimmed by socket space there,
which fixes the overcommit problem and its consequences.
o The compatibility handling for FreeBSD 4 sendfile headers API is pushed
up the stack to syscall wrappers. This required a copy and paste of the
code, but in turn this allowed to remove extra stack carried parameter
from fo_sendfile_t, and embrace entire compat code into #ifdef. If in
future we got more fo_sendfile_t function, the copy and paste level would
even reduce.

Reviewed by: emax, gallatin, Maxim Dounin <mdounin mdounin.ru>
Tested by: Vitalij Satanivskij <satan ukr.net>
Sponsored by: Netflix


# 295039 29-Jan-2016 kib

Add implementations of sendmmsg(3) and recvmmsg(3) functions which
wraps sendmsg(2) and recvmsg(2) into batch send and receive operation.
The goal of this implementation is only to provide API compatibility
with Linux.

The cancellation behaviour of the functions is not quite right, but
due to relative rare use of cancellation it is considered acceptable
comparing with the complexity of the correct implementation. If
functions are reimplemented as syscalls, the fix would come almost
trivial. The direct use of the syscall trampolines instead of libc
wrappers for sendmsg(2) and recvmsg(2) is to avoid data loss on
cancellation.

Submitted by: Boris Astardzhiev <boris.astardzhiev@gmail.com>
Discussed with: jilles (cancellation behaviour)
MFC after: 1 month


# 293439 08-Jan-2016 glebius

New sendfile(2) syscall. A joint effort of NGINX and Netflix from 2013 and
up to now.

The new sendfile is the code that Netflix uses to send their multiple tens
of gigabits of data per second. The new implementation features asynchronous
I/O, when I/O operations are launched, but not awaited to be complete. An
explanation of why such behavior is beneficial compared to old one is
going to be too long for a commit message, so we will skip it here.

Additional features of new syscall are extra flags, which provide an
application more control over data sent. The SF_NOCACHE flag tells
kernel that data shouldn't be cached after it was sent. The SF_READAHEAD()
macro allows to specify readahead size in pages.

The new syscalls is a drop in replacement. No modifications are required
to applications. One can take nginx binary for stable/10 and run it
successfully on head. Although SF_NODISKIO lost its original sense, as now
sendfile doesn't block, and now means something completely different (tm),
using the new sendfile the old way is absolutely safe.

Celebrates: Netflix global launch!
Sponsored by: Nginx, Inc.
Sponsored by: Netflix
Relnotes: yes


# 274403 11-Nov-2014 glebius

Remove SF_KQUEUE code. This code was developed at Netflix, but was not
ever used. It didn't go into stable/10, neither was documented.
It might be useful, but we collectively decided to remove it, rather
leave it abandoned and unmaintained. It is removed in one single
commit, so restoring it should be easy, if anyone wants to reopen
this idea.

Sponsored by: Netflix


# 262489 25-Feb-2014 jhb

Remove more constants related to static sysctl nodes. The MAXID constants
were primarily used to size the sysctl name list macros that were removed
in r254295. A few other constants either did not have an associated
sysctl node, or the associated node used OID_AUTO instead.

PR: ports/184525 (exp-run)


# 260804 17-Jan-2014 adrian

Implement the extension api for sendfile to allow for kqueue notifications.

This is still under a bit of flux, as the final API hasn't been nailed
down. It's also unclear whether we should define the two new types in the
header or not - it may allow bad code to compile that shouldn't (ie,
since uintX's are defined, the developer may not include sys/types.h.)

Reviewed by: peter, imp, bde
Sponsored by: Netflix, Inc.


# 254925 26-Aug-2013 jhb

Remove most of the remaining sysctl name list macros. They were only
ever intended for use in sysctl(8) and it has not used them for many
years.

Reviewed by: bde
Tested by: exp-run by bdrewery


# 254356 15-Aug-2013 glebius

Make sendfile() a method in the struct fileops. Currently only
vnode backed file descriptors have this method implemented.

Reviewed by: kib
Sponsored by: Nginx, Inc.
Sponsored by: Netflix


# 254122 09-Aug-2013 jeff

- Reserve a special AF for SDP. The one we were incorrectly using before
was taken by another AF.

Sponsored by: EMC / Isilon Storage Division


# 251673 12-Jun-2013 kevlo

Add PF_IEEE80211 definition.

Reviewed by: rpaulo


# 250154 01-May-2013 jilles

Add accept4() system call.

The accept4() function, compared to accept(), allows setting the new file
descriptor atomically close-on-exec and explicitly controlling the
non-blocking status on the new socket. (Note that the latter point means
that accept() is not equivalent to any form of accept4().)

The linuxulator's accept4 implementation leaves a race window where the new
file descriptor is not close-on-exec because it calls sys_accept(). This
implementation leaves no such race window (by using falloc() flags). The
linuxulator could be fixed and simplified by using the new code.

Like accept(), accept4() is async-signal-safe, a cancellation point and
permitted in capability mode.


# 248932 30-Mar-2013 jilles

Improve namespacing in <sys/socket.h>:

* MSG_NOSIGNAL is in POSIX.1-2008.
* MSG_NOTIFICATION (SCTP) is not in POSIX.
* PRU_FLUSH_* (SCTP) are not in POSIX.
* bindat()/connectat() are not in POSIX.

Discussed with: rrs (PRU_FLUSH_*)


# 248534 19-Mar-2013 jilles

Implement SOCK_CLOEXEC, SOCK_NONBLOCK and MSG_CMSG_CLOEXEC.

This change allows creating file descriptors with close-on-exec set in some
situations. SOCK_CLOEXEC and SOCK_NONBLOCK can be OR'ed in socket() and
socketpair()'s type parameter, and MSG_CMSG_CLOEXEC to recvmsg() makes file
descriptors (SCM_RIGHTS) atomically close-on-exec.

The numerical values for SOCK_CLOEXEC and SOCK_NONBLOCK are as in NetBSD.
MSG_CMSG_CLOEXEC is the first free bit for MSG_*.

The SOCK_* flags are not passed to MAC because this may cause incorrect
failures and can be done later via fcntl() anyway. On the other hand, audit
is expected to cope with the new flags.

For MSG_CMSG_CLOEXEC, unp_externalize() is extended to take a flags
argument.

Reviewed by: kib


# 247667 02-Mar-2013 pjd

- Implement two new system calls:

int bindat(int fd, int s, const struct sockaddr *addr, socklen_t addrlen);
int connectat(int fd, int s, const struct sockaddr *name, socklen_t namelen);

which allow to bind and connect respectively to a UNIX domain socket with a
path relative to the directory associated with the given file descriptor 'fd'.

- Add manual pages for the new syscalls.

- Make the new syscalls available for processes in capability mode sandbox.

- Add capability rights CAP_BINDAT and CAP_CONNECTAT that has to be present on
the directory descriptor for the syscalls to work.

- Update audit(4) to support those two new syscalls and to handle path
in sockaddr_un structure relative to the given directory descriptor.

- Update procstat(1) to recognize the new capability rights.

- Document the new capability rights in cap_rights_limit(2).

Sponsored by: The FreeBSD Foundation
Discussed with: rwatson, jilles, kib, des


# 246210 01-Feb-2013 jhb

Add placeholder constants to reserve a portion of the socket option
name space for use by downstream vendors to add custom options.

MFC after: 2 weeks


# 232179 26-Feb-2012 kib

Add SO_PROTOCOL/SO_PROTOTYPE socket SOL_SOCKET-level option to get the
socket protocol number. This is useful since the socket type can
be implemented by different protocols in the same protocol family,
e.g. SOCK_STREAM may be provided by both TCP and SCTP.

Submitted by: Jukka A. Ukkonen <jau iki fi>
PR: kern/162352
Discussed with: bz
Reviewed by: glebius
MFC after: 2 weeks


# 231520 11-Feb-2012 bz

Properly name the sysctl to "iflistl" rather than "iflist2", which had been
the prototype name and slipped in in r231505.

Spotted in a reply from: bde
MFC after: 3 days


# 231505 11-Feb-2012 bz

Introduce a new NET_RT_IFLISTL API to query the address list. It works
on extended and extensible structs if_msghdrl and ifa_msghdrl. This
will allow us to extend both the msghdrl structs and eventually if_data
in the future without breaking the ABI.

Bump __FreeBSD_version to allow ports to more easily detect the new API.

Reviewed by: glebius, brooks
MFC after: 3 days


# 220742 17-Apr-2011 jilles

Allow using CMSG_NXTHDR with -Wcast-align.

If various checks are omitted, the CMSG_NXTHDR macro expands to
(struct cmsghdr *)((char *)(cmsg) + \
_ALIGN(((struct cmsghdr *)(cmsg))->cmsg_len))

Although there is no alignment problem (assuming cmsg is properly aligned
and _ALIGN is correct), this violates -Wcast-align on strict-alignment
architectures. Therefore an intermediate cast to void * is appropriate here.

There is no workaround other than not using -Wcast-align.

MFC after: 2 weeks


# 215178 12-Nov-2010 luigi

This commit implements the SO_USER_COOKIE socket option, which lets
you tag a socket with an uint32_t value. The cookie can then be
used by the kernel for various purposes, e.g. setting the skipto
rule or pipe number in ipfw (this is the reason SO_USER_COOKIE has
been implemented; however there is nothing ipfw-specific in its
implementation).

The ipfw-related code that uses the optopn will be committed separately.

This change adds a field to 'struct socket', but the struct is not
part of any driver or userland-visible ABI so the change should be
harmless.

See the discussion at
http://lists.freebsd.org/pipermail/freebsd-ipfw/2009-October/004001.html

Idea and code from Paul Joe, small modifications and manpage
changes by myself.

Submitted by: Paul Joe
MFC after: 1 week


# 201955 09-Jan-2010 brooks

Improve the comment about CMGROUP_MAX.

MFC after: 3 days


# 196994 08-Sep-2009 phk

Get rid of the _NO_NAMESPACE_POLLUTION kludge by creating an
architecture specific include file containing the _ALIGN*
stuff which <sys/socket.h> needs.


# 196967 08-Sep-2009 phk

Move the duplicate definition of struct sockaddr_storage to its own
include file, and include this where the previous duplicate definitions were.

Static program checkers like FlexeLint rightfully take a dim view of
duplicate definitions, even if they currently are identical.


# 181440 08-Aug-2008 delphij

Add prototype defination for setfib(2) to sys/socket.h.


# 180641 20-Jul-2008 kmacy

Add accessor functions for socket fields.

MFC after: 1 week


# 178888 09-May-2008 julian

Add code to allow the system to handle multiple routing tables.
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)

Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.

From my notes:

-----

One thing where FreeBSD has been falling behind, and which by chance I
have some time to work on is "policy based routing", which allows
different
packet streams to be routed by more than just the destination address.

Constraints:
------------

I want to make some form of this available in the 6.x tree
(and by extension 7.x) , but FreeBSD in general needs it so I might as
well do it in -current and back port the portions I need.

One of the ways that this can be done is to have the ability to
instantiate multiple kernel routing tables (which I will now
refer to as "Forwarding Information Bases" or "FIBs" for political
correctness reasons). Which FIB a particular packet uses to make
the next hop decision can be decided by a number of mechanisms.
The policies these mechanisms implement are the "Policies" referred
to in "Policy based routing".

One of the constraints I have if I try to back port this work to
6.x is that it must be implemented as a EXTENSION to the existing
ABIs in 6.x so that third party applications do not need to be
recompiled in timespan of the branch.

This first version will not have some of the bells and whistles that
will come with later versions. It will, for example, be limited to 16
tables in the first commit.
Implementation method, Compatible version. (part 1)
-------------------------------
For this reason I have implemented a "sufficient subset" of a
multiple routing table solution in Perforce, and back-ported it
to 6.x. (also in Perforce though not always caught up with what I
have done in -current/P4). The subset allows a number of FIBs
to be defined at compile time (8 is sufficient for my purposes in 6.x)
and implements the changes needed to allow IPV4 to use them. I have not
done the changes for ipv6 simply because I do not need it, and I do not
have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.

Other protocol families are left untouched and should there be
users with proprietary protocol families, they should continue to work
and be oblivious to the existence of the extra FIBs.

To understand how this is done, one must know that the current FIB
code starts everything off with a single dimensional array of
pointers to FIB head structures (One per protocol family), each of
which in turn points to the trie of routes available to that family.

The basic change in the ABI compatible version of the change is to
extent that array to be a 2 dimensional array, so that
instead of protocol family X looking at rt_tables[X] for the
table it needs, it looks at rt_tables[Y][X] when for all
protocol families except ipv4 Y is always 0.
Code that is unaware of the change always just sees the first row
of the table, which of course looks just like the one dimensional
array that existed before.

The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
are all maintained, but refer only to the first row of the array,
so that existing callers in proprietary protocols can continue to
do the "right thing".
Some new entry points are added, for the exclusive use of ipv4 code
called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
which have an extra argument which refers the code to the correct row.

In addition, there are some new entry points (currently called
rtalloc_fib() and friends) that check the Address family being
looked up and call either rtalloc() (and friends) if the protocol
is not IPv4 forcing the action to row 0 or to the appropriate row
if it IS IPv4 (and that info is available). These are for calling
from code that is not specific to any particular protocol. The way
these are implemented would change in the non ABI preserving code
to be added later.

One feature of the first version of the code is that for ipv4,
the interface routes show up automatically on all the FIBs, so
that no matter what FIB you select you always have the basic
direct attached hosts available to you. (rtinit() does this
automatically).

You CAN delete an interface route from one FIB should you want
to but by default it's there. ARP information is also available
in each FIB. It's assumed that the same machine would have the
same MAC address, regardless of which FIB you are using to get
to it.

This brings us as to how the correct FIB is selected for an outgoing
IPV4 packet.

Firstly, all packets have a FIB associated with them. if nothing
has been done to change it, it will be FIB 0. The FIB is changed
in the following ways.

Packets fall into one of a number of classes.

1/ locally generated packets, coming from a socket/PCB.
Such packets select a FIB from a number associated with the
socket/PCB. This in turn is inherited from the process,
but can be changed by a socket option. The process in turn
inherits it on fork. I have written a utility call setfib
that acts a bit like nice..

setfib -3 ping target.example.com # will use fib 3 for ping.

It is an obvious extension to make it a property of a jail
but I have not done so. It can be achieved by combining the setfib and
jail commands.

2/ packets received on an interface for forwarding.
By default these packets would use table 0,
(or possibly a number settable in a sysctl(not yet)).
but prior to routing the firewall can inspect them (see below).
(possibly in the future you may be able to associate a FIB
with packets received on an interface.. An ifconfig arg, but not yet.)

3/ packets inspected by a packet classifier, which can arbitrarily
associate a fib with it on a packet by packet basis.
A fib assigned to a packet by a packet classifier
(such as ipfw) would over-ride a fib associated by
a more default source. (such as cases 1 or 2).

4/ a tcp listen socket associated with a fib will generate
accept sockets that are associated with that same fib.

5/ Packets generated in response to some other packet (e.g. reset
or icmp packets). These should use the FIB associated with the
packet being reponded to.

6/ Packets generated during encapsulation.
gif, tun and other tunnel interfaces will encapsulate using the FIB
that was in effect withthe proces that set up the tunnel.
thus setfib 1 ifconfig gif0 [tunnel instructions]
will set the fib for the tunnel to use to be fib 1.

Routing messages would be associated with their
process, and thus select one FIB or another.
messages from the kernel would be associated with the fib they
refer to and would only be received by a routing socket associated
with that fib. (not yet implemented)

In addition Netstat has been edited to be able to cope with the
fact that the array is now 2 dimensional. (It looks in system
memory using libkvm (!)). Old versions of netstat see only the first FIB.

In addition two sysctls are added to give:
a) the number of FIBs compiled in (active)
b) the default FIB of the calling process.

Early testing experience:
-------------------------

Basically our (IronPort's) appliance does this functionality already
using ipfw fwd but that method has some drawbacks.

For example,
It can't fully simulate a routing table because it can't influence the
socket's choice of local address when a connect() is done.

Testing during the generating of these changes has been
remarkably smooth so far. Multiple tables have co-existed
with no notable side effects, and packets have been routes
accordingly.

ipfw has grown 2 new keywords:

setfib N ip from anay to any
count ip from any to any fib N

In pf there seems to be a requirement to be able to give symbolic names to the
fibs but I do not have that capacity. I am not sure if it is required.

SCTP has interestingly enough built in support for this, called VRFs
in Cisco parlance. it will be interesting to see how that handles it
when it suddenly actually does something.

Where to next:
--------------------

After committing the ABI compatible version and MFCing it, I'd
like to proceed in a forward direction in -current. this will
result in some roto-tilling in the routing code.

Firstly: the current code's idea of having a separate tree per
protocol family, all of the same format, and pointed to by the
1 dimensional array is a bit silly. Especially when one considers that
there is code that makes assumptions about every protocol having the
same internal structures there. Some protocols don't WANT that
sort of structure. (for example the whole idea of a netmask is foreign
to appletalk). This needs to be made opaque to the external code.

My suggested first change is to add routing method pointers to the
'domain' structure, along with information pointing the data.
instead of having an array of pointers to uniform structures,
there would be an array pointing to the 'domain' structures
for each protocol address domain (protocol family),
and the methods this reached would be called. The methods would have
an argument that gives FIB number, but the protocol would be free
to ignore it.

When the ABI can be changed it raises the possibilty of the
addition of a fib entry into the "struct route". Currently,
the structure contains the sockaddr of the desination, and the resulting
fib entry. To make this work fully, one could add a fib number
so that given an address and a fib, one can find the third element, the
fib entry.

Interaction with the ARP layer/ LL layer would need to be
revisited as well. Qing Li has been working on this already.

This work was sponsored by Ironport Systems/Cisco

Reviewed by: several including rwatson, bz and mlair (parts each)
Obtained from: Ironport systems/Cisco


# 178200 14-Apr-2008 rrs

Add pru_flush routine so a transport can
flush itself during Shutdown

MFC after: 1 week


# 175941 03-Feb-2008 phk

Give sendfile(2) a SF_SYNC flag which makes it wait until all mbufs
referencing the files VM pages are returned from the network stack,
making changes to the file safe.

This flag does not guarantee that the data has been transmitted to the
other end.


# 174560 12-Dec-2007 kmacy

Fix style issues with initial TCP offload commit

Requested by: rwatson
Submitted by: rwatson


# 174556 12-Dec-2007 kmacy

Add driver independent interface to offload active established TCP connections

Reviewed by: silby


# 172217 18-Sep-2007 alfred

Reserve AF_ constants for vendors by giving them the odd numbered
AF_ constants ranging from 39 to 133.

Approved by: re (kensmith)


# 170613 12-Jun-2007 bms

Import rewrite of IPv4 socket multicast layer to support source-specific
and protocol-independent host mode multicast. The code is written to
accomodate IPv6, IGMPv3 and MLDv2 with only a little additional work.

This change only pertains to FreeBSD's use as a multicast end-station and
does not concern multicast routing; for an IGMPv3/MLDv2 router
implementation, consider the XORP project.

The work is based on Wilbert de Graaf's IGMPv3 code drop for FreeBSD 4.6,
which is available at: http://www.kloosterhof.com/wilbert/igmpv3.html

Summary
* IPv4 multicast socket processing is now moved out of ip_output.c
into a new module, in_mcast.c.
* The in_mcast.c module implements the IPv4 legacy any-source API in
terms of the protocol-independent source-specific API.
* Source filters are lazy allocated as the common case does not use them.
They are part of per inpcb state and are covered by the inpcb lock.
* struct ip_mreqn is now supported to allow applications to specify
multicast joins by interface index in the legacy IPv4 any-source API.
* In UDP, an incoming multicast datagram only requires that the source
port matches the 4-tuple if the socket was already bound by source port.
An unbound socket SHOULD be able to receive multicasts sent from an
ephemeral source port.
* The UDP socket multicast filter mode defaults to exclusive, that is,
sources present in the per-socket list will be blocked from delivery.
* The RFC 3678 userland functions have been added to libc: setsourcefilter,
getsourcefilter, setipv4sourcefilter, getipv4sourcefilter.
* Definitions for IGMPv3 are merged but not yet used.
* struct sockaddr_storage is now referenced from <netinet/in.h>. It
is therefore defined there if not already declared in the same way
as for the C99 types.
* The RFC 1724 hack (specify 0.0.0.0/8 addresses to IP_MULTICAST_IF
which are then interpreted as interface indexes) is now deprecated.
* A patch for the Rhyolite.com routed in the FreeBSD base system
is available in the -net archives. This only affects individuals
running RIPv1 or RIPv2 via point-to-point and/or unnumbered interfaces.
* Make IPv6 detach path similar to IPv4's in code flow; functionally same.
* Bump __FreeBSD_version to 700048; see UPDATING.

This work was financially supported by another FreeBSD committer.

Obtained from: p4://bms_netdev
Submitted by: Wilbert de Graaf (original work)
Reviewed by: rwatson (locking), silence from fenner,
net@ (but with encouragement)


# 168867 19-Apr-2007 mtm

Make inet6_rth_* family of functions more compliant with RFC3542:
1. CMSG_NXTHDR(mhdr, cmsg) is supposed to dereference cmsg and return
the next header in the chain. If cmsg is NULL it should return
the first header, behaving essentially like CMSG_FIRSTHDR().
2. inet6_rth_(space|init|add) should do basic checking on their input
to verify that the number of headers (segments) is
between 0 and 127 inclusive.

MFC-After: 1 month


# 163953 03-Nov-2006 rrs

Ok, here it is, we finally add SCTP to current. Note that this
work is not just mine, but it is also the works of Peter Lei
and Michael Tuexen. They both are my two key other developers
working on the project.. and they need ata-boy's too:
****
peterlei@cisco.com
tuexen@fh-muenster.de
****
I did do a make sysent which updated the
syscall's and sysproto.. I hope that is correct... without
it you don't build since we have new syscalls for SCTP :-0

So go out and look at the NOTES, add
option SCTP (make sure inet and inet6 are present too)
and play with SCTP.

I will see about comitting some test tools I have after I
figure out where I should place them. I also have a
lib (libsctp.a) that adds some of the missing socketapi
functions that I need to put into lib's.. I will talk
to George about this :-)

There may still be some 64 bit issues in here, none of
us have a 64 bit processor to test with yet.. Michael
may have a MAC but thats another beast too..

If you have a mac and want to use SCTP contact Michael
he maintains a web site with a loadable module with
this code :-)

Reviewed by: gnn
Approved by: gnn


# 163913 02-Nov-2006 andre

Rewrite kern_sendfile() to work in two loops, the inner which turns as many
VM pages into mbufs as it can -- up to the free send socket buffer space.
The outer loop then drops the whole mbuf chain into the send socket buffer,
calls tcp_output() on it and then waits until 50% of the socket buffer are
free again to repeat the cycle. This way tcp_output() gets the full amount
of data to work with and can issue up to 64K sends for TSO to chop up in
the network adapter without using any CPU cycles. Thus it gets very efficient
especially with the readahead the VM and I/O system do.

The previous sendfile(2) code simply looped over the file, turned each 4K
page into an mbuf and sent it off. This had the effect that TSO could only
generate 2 packets per send instead of up to 44 at its maximum of 64K.

Add experimental SF_MNOWAIT flag to sendfile(2) to return ENOMEM instead of
sleeping on mbuf allocation failures.

Benchmarking shows significant improvements (95% confidence):
45% less cpu (or 1.81 times better) with new sendfile vs. old sendfile (non-TSO)
83% less cpu (or 5.7 times better) with new sendfile vs. old sendfile (TSO)

(Sender AMD Opteron 852 (2.6GHz) with em(4) PCI-X-133 interface and receiver
DELL Poweredge SC1425 P-IV Xeon 3.2GHz with em(4) LOM connected back to back
at 1000Base-TX full duplex.)

Sponsored by: TCP/IP Optimization Fundraise 2005
MFC after: 3 month


# 160690 26-Jul-2006 sam

add support for 802.11 packet injection via bpf

Together with: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Reviewed by: arch@
MFC after: 1 month


# 150302 18-Sep-2005 rwatson

Add three new read-only socket options, which allow regression tests
and other applications to query the state of the stack regarding the
accept queue on a listen socket:

SO_LISTENQLIMIT Return the value of so_qlimit (socket backlog)
SO_LISTENQLEN Return the value of so_qlen (complete sockets)
SO_LISTENINCQLEN Return the value of so_incqlen (incomplete sockets)

Minor white space tweaks to existing socket options to make them
consistent.

Discussed with: andre
MFC after: 1 week


# 144978 12-Apr-2005 mdodd

Implement unix(4) socket options LOCAL_CREDS and LOCAL_CONNWAIT.

- Add unp_addsockcred() (for LOCAL_CREDS).
- Add an argument to unp_connect2() to differentiate between
PRU_CONNECT and PRU_CONNECT2. (for LOCAL_CONNWAIT)

Obtained from: NetBSD (with some changes)


# 143308 08-Mar-2005 alfred

Make MSG_NOSIGNAL available to native programs.
Bump FreeBSD_version to note this change.

Reviewed by: sobomax


# 143295 08-Mar-2005 sobomax

Add kernel-only flag MSG_NOSIGNAL to be used in emulation layers to surpress
SIGPIPE signal for the duration of the sento-family syscalls. Use it to
replace previously added hack in Linux layer based on temporarily setting
SO_NOSIGPIPE flag.

Suggested by: alfred


# 139825 07-Jan-2005 imp

/* -> /*- for license, minor formatting changes


# 138206 29-Nov-2004 ps

If soreceive() is called from a socket callback, there's no reason
to do a window update to the peer (thru an ACK) from soreceive()
itself. TCP will do that upon return from the socket callback.
Sending a window update from soreceive() results in a lock reversal.

Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
Reviewed by: rwatson


# 133478 11-Aug-2004 andre

RFC 2292 requires to check msg_controllen, in case that the kernel returns
an empty list for some reasons.

Obtained from:

NetBSD: socket.h,v 1.62 2001/09/07 08:13:01 itojun
OpenBSD: socket.h,v 1.39 2001/09/07 16:45:25 itojun

MFC after: 2 weeks


# 132258 16-Jul-2004 harti

According to POSIX sys/socket.h must define CMSG_NXTHDR but most not
define NULL. This means we cannot use NULL in the definition of CMSG_NXTHDR.
So replace NULL with 0.

PR: kern/60309
Submitted by: Jeff King <peff-freebsd@peff.net>


# 129929 01-Jun-2004 truckman

Whitespace correction - #define should be followed by a tab.


# 129911 31-May-2004 truckman

Add MSG_NBIO flag option to soreceive() and sosend() that causes
them to behave the same as if the SS_NBIO socket flag had been set
for this call. The SS_NBIO flag for ordinary sockets is set by
fcntl(fd, F_SETFL, O_NONBLOCK).

Pass the MSG_NBIO flag to the soreceive() and sosend() calls in
fifo_read() and fifo_write() instead of frobbing the SS_NBIO flag
on the underlying socket for each I/O operation. The O_NONBLOCK
flag is a property of the descriptor, and unlike ordinary sockets,
fifos may be referenced by multiple descriptors.


# 129079 10-May-2004 emax

Mode few Bluetooth defines into system include files

Reviewed by: imp


# 127976 07-Apr-2004 imp

Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core


# 126937 13-Mar-2004 mdodd

Define AF_ARP/PF_ARP.


# 125586 08-Feb-2004 silby

Add the SF_NODISKIO flag to sendfile. This flag causes sendfile to be
mindful of blocking on disk I/O and instead return EBUSY when such
blocking would occur.

Results from the DeBox project indicate that blocking on disk I/O
can slow the performance of a kqueue/poll based webserver. Using
a flag such as SF_NODISKIO and throwing connections that would block
to helper processes/threads helped increase performance.

Currently, only the Flash webserver uses this flag, although it could
probably be applied to thttpd with relative ease.

Idea by: Yaoping Ruan & Vivek Pai


# 125264 31-Jan-2004 phk

Introduce the SO_BINTIME option which takes a high-resolution timestamp
at packet arrival.

For benchmarking purposes SO_BINTIME is preferable to SO_TIMEVAL
since it has higher resolution and lower overhead. Simultaneous
use of the two options is possible and they will return consistent
timestamps.

This introduces an extra test and a function call for SO_TIMEVAL, but I have
not been able to measure that.


# 123811 24-Dec-2003 alfred

Add restrict qualifiers.

PR: 44394
Submitted by: Craig Rodrigues <rodrige@attbi.com>


# 122685 14-Nov-2003 bms

Add a sysctl MIB, NET_RT_IFMALIST, to retrieve multicast group memberships
in a protocol-independent way.

Submitted by: harti


# 111926 05-Mar-2003 peter

Finish driving a stake through the heart of netns and the associated
ifdefs scattered around the place - its dead Jim!

The SMB stuff had stolen AF_NS, make it official.


# 111567 26-Feb-2003 mike

Move the typedef for size_t into _iovec.h, so that size_t is available
for struct iovec.


# 108366 28-Dec-2002 phk

It is bad style to define the same structure in multiple header
files which might be included together.

Things like debuggers and lint-like programs get their knickers in
a twist (rightly so one might add) when they find different locations
for the same named struct depending on which .h file were included
first.

This is a stellar example of Very Bad Thinking on the part of the
standards dudes who wrote that both sys/uio.h and sys/socket.h
should define struct iovec the same way.

Fix this by putting struct iovec into its own miniature sys/_iovec.h
file and #include that from sys/socket.h and sys/uio.h.

Sensible people could just put iovec into sys/_types.h but there
is probably some standard or other which will be violated if we
did something that horrible.


# 107872 14-Dec-2002 fenner

Add prototype for sockatmark().


# 106847 13-Nov-2002 mike

Fix a constant in the standard namespace not to depend on another
constant in the BSD namespace.


# 104983 12-Oct-2002 mike

o Add typedefs for size_t and ssize_t.
o Add typedefs for gid_t, off_t, pid_t, and uid_t in the non-standards
case.
o Add struct iovec (also defined in <sys/uio.h>).
o Add visibility conditionals to avoid defining non-standard
extentions in the standards case.
o Change spelling of some types so they work without including
<sys/types.h> (u_char -> unsigned char, u_short -> unsigned short,
int64 -> __int64, caddr_t -> char *)
o Add comments about missing restrict type-qualifiers and missing
function.


# 102227 21-Aug-2002 mike

o Merge <machine/ansi.h> and <machine/types.h> into a new header
called <machine/_types.h>.
o <machine/ansi.h> will continue to live so it can define MD clock
macros, which are only MD because of gratuitous differences between
architectures.
o Change all headers to make use of this. This mainly involves
changing:
#ifdef _BSD_FOO_T_
typedef _BSD_FOO_T_ foo_t;
#undef _BSD_FOO_T_
#endif
to:
#ifndef _FOO_T_DECLARED
typedef __foo_t foo_t;
#define _FOO_T_DECLARED
#endif

Concept by: bde
Reviewed by: jake, obrien


# 98499 20-Jun-2002 alfred

Implement SO_NOSIGPIPE option for sockets. This allows one to request that
an EPIPE error return not generate SIGPIPE on sockets.

Submitted by: lioux
Inspired by: Darwin


# 98213 14-Jun-2002 rwatson

Reserve two constants for managing socket MAC labels via socket options.


# 98212 14-Jun-2002 rwatson

Whitespaec consistency.


# 98118 11-Jun-2002 wollman

SO_PRIVSTATE has been commented out for long enough now....


# 97728 02-Jun-2002 alfred

bde noticed that SOMAXCONN breaks pretty badly as an option for LINT.
so back it out.


# 95099 20-Apr-2002 mike

Add sa_family_t type to <sys/_types.h> and typedefs to <netinet/in.h>
and <sys/socket.h>. Previously, sa_family_t was only typedef'd in
<sys/socket.h>.


# 92719 19-Mar-2002 alfred

Remove __P


# 90135 03-Feb-2002 markm

Zero functional difference; make some integer constants unsigned, as
they are used in unsigned context. This shuts lint(1) up in a few
significant ways with "signed/unsigned" arithmetic warnings.


# 83045 04-Sep-2001 obrien

style(9) the structure definitions.


# 78137 12-Jun-2001 ume

FreeBSD already avoided namespace pollution (rev.1.45).

Submitted by: bde


# 78107 11-Jun-2001 ume

This is force commit to mention about previous commit.

- avoid namespace pollution by CMSG_ALIGN().


# 78064 11-Jun-2001 ume

Sync with recent KAME.
This work was based on kame-20010528-freebsd43-snap.tgz and some
critical problem after the snap was out were fixed.
There are many many changes since last KAME merge.

TODO:
- The definitions of SADB_* in sys/net/pfkeyv2.h are still different
from RFC2407/IANA assignment because of binary compatibility
issue. It should be fixed under 5-CURRENT.
- ip6po_m member of struct ip6_pktopts is no longer used. But, it
is still there because of binary compatibility issue. It should
be removed under 5-CURRENT.

Reviewed by: itojun
Obtained from: KAME
MFC after: 3 weeks


# 75847 23-Apr-2001 grog

Add address families AF_SLOW and AF_SCLUSTER. These are used by the
Sitara QoSworks box.

Obtained from: Sitara Networks Inc.


# 75458 13-Apr-2001 alfred

Make SOMAXCONN a kernel option.

Submitted by: Terry Lambert <terry@lambert.org>


# 74660 22-Mar-2001 alfred

Remove struct cmessage from sys/socket.h and reintroduce the private
definitions.

Requested by: wollman


# 74627 22-Mar-2001 alfred

Hopefully fix some of the bugs in passing credentials over UNIX domain sockets.

Make struct cmessage visible from socket.h (about 4 places were
defining it for themselves which wasn't good)

Make __rpc_get_local_uid() useable and give it prototype that's
visible.

Fix some issues with printing out usernames from rpcbind and keyserv.


# 72554 17-Feb-2001 bde

Fixed disordering in previous commit.


# 72510 15-Feb-2001 ume

Correct 2nd argument of getnameinfo(3) to socklen_t.

Reviewed by: itojun


# 70184 19-Dec-2000 assar

remove pfctlinput


# 69042 22-Nov-2000 asmodai

Reduce number of #ifdef nestings.

Submitted by: bde


# 68498 08-Nov-2000 asmodai

Fix CMSG and ALIGN macro usage.
Previously we had to include <machine/param.h> or <sys/param.h> bogusly
due to the fact that <sys/socket.h> CMSG macros needed the ALIGN macro,
which was defined in param.h. However, including param.h was a disaster
for namespace pollution.
This solution, as contributed by shin a while ago, fixes it elegantly
by wrapping the definitions around some namespace pollution preventer
definitions.
This patch was long overdue.
This should allow any network programmer to use <sys/socket.h> as
before.

PR: 19971, 20530
Submitted by: Martin Kaeske <MartinKaeske@lausitz.net>
Mark Andrews <Mark.Andrews@nominum.com>
Patch submitted by: shin
Reviewed by: bde


# 66255 22-Sep-2000 asmodai

Document which RFC introduced CMSG_SPACE() and CMSG_LEN().


# 66240 22-Sep-2000 asmodai

Fix comment about the bsd-api-new-02a draft. This has been superceded
by RFC 2553.


# 61837 19-Jun-2000 alfred

return of the accept filter part II

accept filters are now loadable as well as able to be compiled into
the kernel.

two accept filters are provided, one that returns sockets when data
arrives the other when an http request is completed (doesn't work
with 0.9 requests)

Reviewed by: jmg


# 61799 18-Jun-2000 alfred

backout accept optimizations.

Requested by: jmg, dcs, jdp, nate


# 61714 15-Jun-2000 alfred

add socketoptions DELAYACCEPT and HTTPACCEPT which will not allow an accept()
until the incoming connection has either data waiting or what looks like a
HTTP request header already in the socketbuffer. This ought to reduce
the context switch time and overhead for processing requests.

The initial idea and code for HTTPACCEPT came from Yahoo engineers and has
been cleaned up and a more lightweight DELAYACCEPT for non-http servers
has been added

Reviewed by: silence on hackers.


# 57911 11-Mar-2000 shin

Fix sockaddr_storage related macro definition, as ss_family member type change.
(Currently, no effect but for future portability)

Approved by: jkh

Reviewed by: bde


# 57719 03-Mar-2000 shin

CMSG_XXX macros alignment fixes to follow RFC2292.

Approved by: jkh

Submitted by: Partly from tech@openbsd
Reviewed by: itojun


# 55917 13-Jan-2000 shin

Change struct sockaddr_storage member name, because following change
is very likely to become consensus as recent ietf/ipng mailing list
discussion. Also recent KAME repository and other KAME patched BSDs
also applied it.

s/__ss_family/ss_family/
s/__ss_len/ss_len/

Makeworld is confirmed, and no application should be affected by this change
yet.


# 55205 29-Dec-1999 peter

Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot). This is consistant with the other
BSD's who made this change quite some time ago. More commits to come.


# 55009 22-Dec-1999 shin

IPSEC support in the kernel.
pr_input() routines prototype is also changed to support IPSEC and IPV6
chained protocol headers.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project


# 53678 24-Nov-1999 phk

General clean-up of socket.h and associated sources to synchronise up
with NetBSD and the Single Unix Specification v2.

This updates some structures with other, almost equivalent types and
effort is under way to get the whole more consistent.

Also removes a double definition of INET6 and some other clean-ups.

Reviewed by: green, bde, phk
Some part obtained from: NetBSD, SUSv2 specification


# 52904 05-Nov-1999 shin

KAME related header files additions and merges.
(only those which don't affect c source files so much)

Reviewed by: cvs-committers
Obtained from: KAME project


# 52435 22-Oct-1999 julian

Add missing entries in a structure.


# 52424 21-Oct-1999 julian

fix typo


# 52419 21-Oct-1999 julian

Whistle's Netgraph link-layer (sometimes more) networking infrastructure.
Been in production for 3 years now. Gives Instant Frame relay to if_sr
and if_ar drivers, and PPPOE support soon. See:
ftp://ftp.whistle.com/pub/archie/netgraph/index.html
for on-line manual pages.

Reviewed by: Doug Rabson (dfr@freebsd.org)
Obtained from: Whistle CVS tree


# 52248 15-Oct-1999 msmith

Implement pseudo_AF_HDRCMPLT, which controls the state of the 'header
completion' flag. If set, the interface output routine will assume that
the packet already has a valid link-level source address. This defaults
to off (the address is overwritten)

PR: kern/10680
Submitted by: "Christopher N . Harrell" <cnh@mindspring.net>
Obtained from: NetBSD


# 50477 27-Aug-1999 peter

$Id$ -> $FreeBSD$


# 40931 05-Nov-1998 dg

Implemented zero-copy TCP/IP extensions via sendfile(2) - send a
file to a stream socket. sendfile(2) is similar to implementations in
HP-UX, Linux, and other systems, but the API is more extensive and
addresses many of the complaints that the Apache Group and others have
had with those other implementations. Thanks to Marc Slemko of the
Apache Group for helping me work out the best API for this.
Anyway, this has the "net" result of speeding up sends of files over
TCP/IP sockets by about 10X (that is to say, uses 1/10th of the CPU
cycles) when compared to a traditional read/write loop.


# 39271 15-Sep-1998 phk

(this is an extract from src/share/examples/atm/README)

===================================
HARP | Host ATM Research Platform
===================================

HARP 3

What is this stuff?
-------------------
The Advanced Networking Group (ANG) at the Minnesota Supercomputer Center,
Inc. (MSCI), as part of its work on the MAGIC Gigabit Testbed, developed
the Host ATM Research Platform (HARP) software, which allows IP hosts to
communicate over ATM networks using standard protocols. It is intended to
be a high-quality platform for IP/ATM research.

HARP provides a way for IP hosts to connect to ATM networks. It supports
standard methods of communication using IP over ATM. A host's standard IP
software sends and receives datagrams via a HARP ATM interface. HARP provides
functionality similar to (and typically replaces) vendor-provided ATM device
driver software.

HARP includes full source code, making it possible for researchers to
experiment with different approaches to running IP over ATM. HARP is
self-contained; it requires no other licenses or commercial software packages.

HARP implements support for the IETF Classical IP model for using IP over ATM
networks, including:

o IETF ATMARP address resolution client
o IETF ATMARP address resolution server
o IETF SCSP/ATMARP server
o UNI 3.1 and 3.0 signalling protocols
o Fore Systems's SPANS signalling protocol

What's supported
----------------
The following are supported by HARP 3:

o ATM Host Interfaces
- FORE Systems, Inc. SBA-200 and SBA-200E ATM SBus Adapters
- FORE Systems, Inc. PCA-200E ATM PCI Adapters
- Efficient Networks, Inc. ENI-155p ATM PCI Adapters

o ATM Signalling Protocols
- The ATM Forum UNI 3.1 signalling protocol
- The ATM Forum UNI 3.0 signalling protocol
- The ATM Forum ILMI address registration
- FORE Systems's proprietary SPANS signalling protocol
- Permanent Virtual Channels (PVCs)

o IETF "Classical IP and ARP over ATM" model
- RFC 1483, "Multiprotocol Encapsulation over ATM Adaptation Layer 5"
- RFC 1577, "Classical IP and ARP over ATM"
- RFC 1626, "Default IP MTU for use over ATM AAL5"
- RFC 1755, "ATM Signaling Support for IP over ATM"
- RFC 2225, "Classical IP and ARP over ATM"
- RFC 2334, "Server Cache Synchronization Protocol (SCSP)"
- Internet Draft draft-ietf-ion-scsp-atmarp-00.txt,
"A Distributed ATMARP Service Using SCSP"

o ATM Sockets interface
- The file atm-sockets.txt contains further information

What's not supported
--------------------
The following major features of the above list are not currently supported:

o UNI point-to-multipoint support
o Driver support for Traffic Control/Quality of Service
o SPANS multicast and MPP support
o SPANS signalling using Efficient adapters

This software was developed under the sponsorship of the Defense Advanced
Research Projects Agency (DARPA).

Reviewed (lightly) by: phk
Submitted by: Network Computing Services, Inc.


# 39114 12-Sep-1998 wollman

Define the Posix.1g names for the howto argument to shutdown(2).


# 33006 01-Feb-1998 alex

Added inet6 to CTL_NET_NAMES.


# 31927 21-Dec-1997 bde

Moved some declarations from <sys/socket.h> to the correct places, and
fixed everything that depended on them being misplaced.


# 30354 12-Oct-1997 phk

Last major round (Unless Bruce thinks of somthing :-) of malloc changes.

Distribute all but the most fundamental malloc types. This time I also
remembered the trick to making things static: Put "static" in front of
them.

A couple of finer points by: bde


# 28270 16-Aug-1997 wollman

Fix all areas of the system (or at least all those in LINT) to avoid storing
socket addresses in mbufs. (Socket buffers are the one exception.) A number
of kernel APIs needed to get fixed in order to make this happen. Also,
fix three protocol families which kept PCBs in mbufs to not malloc them
instead. Delete some old compatibility cruft while we're at it, and add
some new routines in the in_cksum family.


# 25609 09-May-1997 kjc

merge ATM driver


# 25311 30-Apr-1997 wollman

Remove SO_PRIVSTATE socket option; it is no longer necessary, nor implemented
in the kernel. inetd should automatically notic that it has gone away
once it is recompiled.


# 24083 21-Mar-1997 wpaul

Add support to sendmsg()/recvmsg() for passing credentials between
processes using AF_LOCAL sockets. This hack is going to be used with
Secure RPC to duplicate a feature of STREAMS which has no real counterpart
in sockets (with STREAMS/TLI, you can apparently use t_getinfo() to learn
UID of a local process on the other side of a transport endpoint).

What happens is this: the client sets up a sendmsg() call with ancillary
data using the SCM_CREDS socket-level control message type. It does not
need to fill in the structure. When the kernel notices the data,
unp_internalize() fills in the cmesgcred structure with the sending
process' credentials (UID, EUID, GID, and ancillary groups). This data
is later delivered to the receiving process. The receiver can then
perform the follwing tests:

- Did the client send ancillary data?
o Yes, proceed.
o No, refuse to authenticate the client.

- The the client send data of type SCM_CREDS?
o Yes, proceed.
o No, refuse to authenticate the client.

- Is the cmsgcred structure the right size?
o Yes, proceed.
o No, signal a possible error.

The receiver can now inspect the credential information and use it to
authenticate the client.


# 22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 17938 30-Aug-1996 peter

Grab the next slot for AF_INET6/PF_INET6, the resolver uses it.


# 17603 15-Aug-1996 jdp

Fix a typo in the #define for PF_RTIP, even though I doubt it will
ever make one bit of difference to anybody.


# 16480 18-Jun-1996 wollman

When bringing the netkey stuff over, I forgot that I had decided to change
AF_KEY into pseudo_AF_KEY, and defined PF_KEY incorrectly. Fix.

Noticed by: pst


# 16368 14-Jun-1996 wollman

This is the `netkey' kernel key-management service (the PF_KEY analogue
to PF_ROUTE) from NRL's IPv6 distribution, heavily modified by me for
better source layout, formatting, and textual conventions. I am told
that this code is no longer under active development, but it's a useful
hack for those interested in doing work on network security, key management,
etc. This code has only been tested twice, so it should be considered
highly experimental.

Obtained from: ftp.ripe.net


# 15701 09-May-1996 wollman

Make it possible to return more than one piece of control information
(PR #1178).
Define a new SO_TIMESTAMP socket option for datagram sockets to return
packet-arrival timestamps as control information (PR #1179).

Submitted by: Louis Mamakos <loiue@TransSys.com>


# 13955 07-Feb-1996 wollman

Define a new socket option, SO_PRIVSTATE. Getting it returns the state
of the SS_PRIV flag in so_state; setting it always clears same.


# 13765 30-Jan-1996 mpp

Fix a bunch of spelling errors in the comment fields of
a bunch of system include files.


# 13258 05-Jan-1996 dg

Increased default SOMAXCONN from 32 to 128. 128 is the largest value I
consider "safe" for most systems. Note that this is (has been for some
time) also tunable with sysctl (via kern.somaxconn) should the operator
wish to increase this value even higher. Also note that 128 is what
the Netscape WWW server reportedly asks for.


# 10706 13-Sep-1995 dg

Increased SOMAXCONN from 5 to 32. 5 was too small a value for just about
any reasonably busy machine, and by any measure is a lousy "max" value.
32 was chosen after a careful analysis of typical listen queue depths
on several busy Internet servers (both web and ftp). I also intend to
add a statistics counter for dropped connection requests due to the limit
being exceeded.


# 6223 07-Feb-1995 wollman

Merge in the socket-level support for Transaction TCP from the OLAH_TTCP
branch.

Submitted by: Andras Olah <olah@cs.utwente.nl>


# 5413 05-Jan-1995 se

Submitted by: Wolfgang Stanglmeier <wolf@dentaro.GUN.de>
Reviewed by: <wollman>
First hooks and defines for the ISDN driver,
that soon will see the light ...


# 3438 08-Oct-1994 phk

Added prototypes here and there. Moved pfctlinput into socket.h.


# 3304 02-Oct-1994 phk

Prototypes, prototypes and even more prototypes. Not quite done yet, but
getting closer all the time.


# 1817 02-Aug-1994 dg

Added $Id$


# 1542 24-May-1994 rgrimes

This commit was generated by cvs2svn to compensate for changes in r1541,
which included commits to RCS files with non-trunk default branches.


# 1541 24-May-1994 rgrimes

BSD 4.4 Lite Kernel Sources