History log of /openbsd-current/sys/netinet6/in6_pcb.c
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1.144 12-Apr-2024 bluhm

Split single TCP inpcb table into IPv4 and IPv6 parts.

With two separate TCP hash tables, each one becomes smaller. When
we remove the exclusive net lock from TCP, contention on internet
PCB table mutex will be reduced. UDP has been split earlier into
IPv4 and IPv6. Replace branch conditions based on INP_IPV6 with
assertions.

OK mvs@


# 1.143 31-Mar-2024 bluhm

Combine route_cache() and rtalloc_mpath() in new route_mpath().

Fill and check the cache and call rtalloc_mpath() together. Then
the caller of route_mpath() does not have to care about the uint32_t
*src pointer and just pass struct in_addr. All the conversions are
done inside the functions.

A previous version of this diff was backed out. There was an
additional rtisvalid() in rtalloc_mpath() that prevented packet
output via interfaces that were not up. Now the route in the cache
has to be valid, but after new lookup, rtalloc_mpath() may return
invalid routes. This generates less errors in userland an preserves
existing behavior.

OK sashan@


# 1.142 22-Mar-2024 bluhm

Make local port which is bound during connect(2) unique per laddr.

in_pcbconnect() did not pass down the address it got from in_pcbselsrc()
to in_pcbpickport(). As a consequence local port numbers selected
during connect(2) were globally unique although they belong to
different addresses. This strict uniqueness is not necessary and
wastes usable ports for outgoing connections.

To solve this, pass ina from in_pcbconnect() to in_pcbbind_locked().
This does not interfere how wildcard sockets are matched with
specific sockets during bind(2). It only allows non-wildcard sockets
to share a local port during connect(2).

OK mvs@ deraadt@


Revision tags: OPENBSD_7_5_BASE
# 1.141 29-Feb-2024 naddy

revert "Combine route_cache() and rtalloc_mpath() in new route_mpath()"

It breaks NFS.

ok claudio@


# 1.140 27-Feb-2024 bluhm

Combine route_cache() and rtalloc_mpath() in new route_mpath().

Fill and check the cache and call rtalloc_mpath() together. Then
the caller of route_mpath() does not have to care about the uint32_t
*src pointer and just pass struct in_addr. All the conversions are
done inside the functions. ro->ro_rt is either valid or NULL. Note
that some places have a stricter rtisvalid() now compared to the
previous NULL check.

OK claudio@


# 1.139 22-Feb-2024 bluhm

Make the route cache aware of multipath routing.

Pass source address to route_cache() and store it in struct route.
Cached multipath routes are only valid if source address matches.
If sysctl multipath changes, increase route generation number.

OK claudio@


# 1.138 13-Feb-2024 bluhm

Merge struct route and struct route_in6.

Use a common struct route for both inet and inet6. Unfortunately
struct sockaddr is shorter than sockaddr_in6, so netinet/in.h has
to be exposed from net/route.h. Struct route has to be bsd visible
for userland as netstat kvm code inspects inp_route. Internet PCB
and TCP SYN cache can use a plain struct route now. All specific
sockaddr types for inet and inet6 are embeded there.

OK claudio@


# 1.137 11-Feb-2024 bluhm

Remove include netinet6/ip6_var.h from netinet/in_pcb.h.

OK mvs@


# 1.136 09-Feb-2024 bluhm

Route cache function returns hit or miss.

The route_cache() function can easily return whether it was a cache
hit or miss. Then the logic to perform a route lookup gets a bit
simpler. Some more complicated if (ro->ro_rt == NULL) checks still
exist elsewhere.
Also use route cache in in_pcbselsrc() instead of filling struct
route manually.

OK claudio@


# 1.135 07-Feb-2024 bluhm

Use the route generation number also for IPv6.

Implement route6_cache() to check whether the cached route is still
valid and otherwise fill caching parameter of struct route_in6.
Also count cache hits and misses in netstat. in_pcbrtentry() uses
route cache now.

OK claudio@


# 1.134 31-Jan-2024 bluhm

Split in_pcbrtentry() and in6_pcbrtentry() based on INP_IPV6.

Splitting the IPv6 code into a separate function results in less
#ifdef INET6. Also struct route_in6 *ro in in6_pcbrtentry() is of
the correct type and in_pcbrtentry() does not rely on the fact that
inp_route and inp_route6 are pointers to the same union.

OK kn@ claudio@


# 1.133 28-Jan-2024 bluhm

Use more specific sockaddr type for inpcb notify.

in_pcbnotifyall() is an IPv4 only function. All callers check that
sockaddr dst is in fact a sockaddr_in. Pass the more spcific type
and remove the runtime check at beginning of in_pcbnotifyall().
Use const sockaddr_in in in_pcbnotifyall() and const sockaddr_in6
in6_pcbnotify() as dst parameter.

OK millert@


# 1.132 09-Jan-2024 bluhm

Convert some struct inpcb parameter to const pointer.

OK millert@


# 1.131 07-Dec-2023 bluhm

Inpcb table mutex protects addr and port during bind(2) and connect(2).

in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() have to set
addresses and ports within the same critical section as the inpcb
hash table calculation. Also lookup and address selection have to
be protected to avoid bindings and connections that are not unique.

For that in_pcbpickport() and in_pcbbind_locked() expect that the
table mutex is already taken. The functions in_pcblookup_lock(),
in_pcblookup_local_lock(), and in_pcbaddrisavail_lock() grab the
mutex iff the lock parameter is IN_PCBLOCK_GRAB. Otherwise the
parameter is IN_PCBLOCK_HOLD has the lock has to be taken already.
Note that in_pcblookup_lock() and in_pcblookup_local() return an
inp with increased reference iff they take and release the lock.
Otherwise the caller protects the life time of the inp.

This gives enough flexibility that in_pcbbind() and in_pcbconnect()
can hold the table mutex when they need it. The public inpcb API
does not change.

OK sashan@ mvs@


# 1.130 03-Dec-2023 bluhm

Rename all in6p local variables to inp.

There exists no struct in6pcb in OpenBSD, this was an old kame idea.
Calling the local variable in6p does not make sense, it is actually
a struct inpcb. Also in6p is not used consistently in inet6 code.
Having the same convention for IPv4 and IPv6 is less confusing.

OK sashan@ mvs@


# 1.129 03-Dec-2023 bluhm

Use INP_IPV6 flag instead of sotopf().

During initialization in_pcballoc() sets INP_IPV6 once to avoid
reaching through inp_socket->so_proto->pr_domain->dom_family. Use
this flag consistently.

OK sashan@ mvs@


# 1.128 01-Dec-2023 bluhm

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@


# 1.127 01-Dec-2023 bluhm

Make internet PCB connect more consistent.

The public interface is in_pcbconnect(). It dispatches to
in6_pcbconnect() if necessary. Call the former from tcp_connect()
and udp_connect().
In in6_pcbconnect() initialization in6a = NULL is not necessary.
in6_pcbselsrc() sets the pointer, but does not read the value.
Pass a constant in6_addr pointer to in6_pcbselsrc() and in6_selectsrc().
It returns a reference to the address of some internal data structure.
We want to be sure that in6_addr is not modified this way. IPv4
in_pcbselsrc() solves this by passing a copy of the address.

OK kn@ sashan@ mvs@


# 1.126 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.125 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


Revision tags: OPENBSD_7_4_BASE
# 1.124 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.123 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.122 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.121 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.120 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.119 08-Aug-2022 bluhm

To make protocol input functions MP safe, internet PCB need protection.
Use their reference counter in more places.
The in_pcb lookup functions hold the PCBs in hash tables protected
by table->inpt_mtx mutex. Whenever a result is returned, increment
the ref count before releasing the mutex. Then the inp can be used
as long as neccessary. Unref it at the end of all functions that
call in_pcb lookup.
As a shortcut, pf may also hold a reference to the PCB. When
pf_inp_lookup() returns it, it also incements the ref count and the
caller can handle it like the inp from table lookup.
OK sashan@


# 1.118 06-Aug-2022 bluhm

Clean up the netlock macros. Merge NET_RLOCK_IN_SOFTNET and
NET_RLOCK_IN_IOCTL, which have the same implementation. The R and
W are hard to see, call the new macro NET_LOCK_SHARED. Rename the
opposite assertion from NET_ASSERT_WLOCKED to NET_ASSERT_LOCKED_EXCLUSIVE.
Update some outdated comments about net locking.
OK mpi@ mvs@


# 1.117 14-Apr-2022 claudio

Relax address availability check for multicast binds.

While it makes sense to limit bind(2) of unicast addresses that overlap
each other to be all from the same UID (like 0.0.0.0:53 and 127.0.0.1:53)
it makes little sense for multicast. Multicast is delivered to all sockets
that match so there is no risk of someone stealing traffic from someone
else. This should hopefully help with mDNS as reported by robert@
OK deraadt@ bluhm@


Revision tags: OPENBSD_7_1_BASE
# 1.116 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.115 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.114 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.113 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.112 11-Feb-2021 patrick

Swap faddr/laddr and fport/lport arguments in call to stoeplitz_ipXport().
Technically the whole point of the stoeplitz API is that it's symmetric,
meaning that the order of addresses and ports doesn't matter and will produce
the same hash value.

Coverity CID 1501717
ok dlg@


# 1.111 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.110 29-Nov-2019 nayden

add __func__ to panic() and printf() calls in sys/netinet6/*
ok benno@ mortimer@


# 1.109 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.108 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.143 31-Mar-2024 bluhm

Combine route_cache() and rtalloc_mpath() in new route_mpath().

Fill and check the cache and call rtalloc_mpath() together. Then
the caller of route_mpath() does not have to care about the uint32_t
*src pointer and just pass struct in_addr. All the conversions are
done inside the functions.

A previous version of this diff was backed out. There was an
additional rtisvalid() in rtalloc_mpath() that prevented packet
output via interfaces that were not up. Now the route in the cache
has to be valid, but after new lookup, rtalloc_mpath() may return
invalid routes. This generates less errors in userland an preserves
existing behavior.

OK sashan@


# 1.142 22-Mar-2024 bluhm

Make local port which is bound during connect(2) unique per laddr.

in_pcbconnect() did not pass down the address it got from in_pcbselsrc()
to in_pcbpickport(). As a consequence local port numbers selected
during connect(2) were globally unique although they belong to
different addresses. This strict uniqueness is not necessary and
wastes usable ports for outgoing connections.

To solve this, pass ina from in_pcbconnect() to in_pcbbind_locked().
This does not interfere how wildcard sockets are matched with
specific sockets during bind(2). It only allows non-wildcard sockets
to share a local port during connect(2).

OK mvs@ deraadt@


Revision tags: OPENBSD_7_5_BASE
# 1.141 29-Feb-2024 naddy

revert "Combine route_cache() and rtalloc_mpath() in new route_mpath()"

It breaks NFS.

ok claudio@


# 1.140 27-Feb-2024 bluhm

Combine route_cache() and rtalloc_mpath() in new route_mpath().

Fill and check the cache and call rtalloc_mpath() together. Then
the caller of route_mpath() does not have to care about the uint32_t
*src pointer and just pass struct in_addr. All the conversions are
done inside the functions. ro->ro_rt is either valid or NULL. Note
that some places have a stricter rtisvalid() now compared to the
previous NULL check.

OK claudio@


# 1.139 22-Feb-2024 bluhm

Make the route cache aware of multipath routing.

Pass source address to route_cache() and store it in struct route.
Cached multipath routes are only valid if source address matches.
If sysctl multipath changes, increase route generation number.

OK claudio@


# 1.138 13-Feb-2024 bluhm

Merge struct route and struct route_in6.

Use a common struct route for both inet and inet6. Unfortunately
struct sockaddr is shorter than sockaddr_in6, so netinet/in.h has
to be exposed from net/route.h. Struct route has to be bsd visible
for userland as netstat kvm code inspects inp_route. Internet PCB
and TCP SYN cache can use a plain struct route now. All specific
sockaddr types for inet and inet6 are embeded there.

OK claudio@


# 1.137 11-Feb-2024 bluhm

Remove include netinet6/ip6_var.h from netinet/in_pcb.h.

OK mvs@


# 1.136 09-Feb-2024 bluhm

Route cache function returns hit or miss.

The route_cache() function can easily return whether it was a cache
hit or miss. Then the logic to perform a route lookup gets a bit
simpler. Some more complicated if (ro->ro_rt == NULL) checks still
exist elsewhere.
Also use route cache in in_pcbselsrc() instead of filling struct
route manually.

OK claudio@


# 1.135 07-Feb-2024 bluhm

Use the route generation number also for IPv6.

Implement route6_cache() to check whether the cached route is still
valid and otherwise fill caching parameter of struct route_in6.
Also count cache hits and misses in netstat. in_pcbrtentry() uses
route cache now.

OK claudio@


# 1.134 31-Jan-2024 bluhm

Split in_pcbrtentry() and in6_pcbrtentry() based on INP_IPV6.

Splitting the IPv6 code into a separate function results in less
#ifdef INET6. Also struct route_in6 *ro in in6_pcbrtentry() is of
the correct type and in_pcbrtentry() does not rely on the fact that
inp_route and inp_route6 are pointers to the same union.

OK kn@ claudio@


# 1.133 28-Jan-2024 bluhm

Use more specific sockaddr type for inpcb notify.

in_pcbnotifyall() is an IPv4 only function. All callers check that
sockaddr dst is in fact a sockaddr_in. Pass the more spcific type
and remove the runtime check at beginning of in_pcbnotifyall().
Use const sockaddr_in in in_pcbnotifyall() and const sockaddr_in6
in6_pcbnotify() as dst parameter.

OK millert@


# 1.132 09-Jan-2024 bluhm

Convert some struct inpcb parameter to const pointer.

OK millert@


# 1.131 07-Dec-2023 bluhm

Inpcb table mutex protects addr and port during bind(2) and connect(2).

in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() have to set
addresses and ports within the same critical section as the inpcb
hash table calculation. Also lookup and address selection have to
be protected to avoid bindings and connections that are not unique.

For that in_pcbpickport() and in_pcbbind_locked() expect that the
table mutex is already taken. The functions in_pcblookup_lock(),
in_pcblookup_local_lock(), and in_pcbaddrisavail_lock() grab the
mutex iff the lock parameter is IN_PCBLOCK_GRAB. Otherwise the
parameter is IN_PCBLOCK_HOLD has the lock has to be taken already.
Note that in_pcblookup_lock() and in_pcblookup_local() return an
inp with increased reference iff they take and release the lock.
Otherwise the caller protects the life time of the inp.

This gives enough flexibility that in_pcbbind() and in_pcbconnect()
can hold the table mutex when they need it. The public inpcb API
does not change.

OK sashan@ mvs@


# 1.130 03-Dec-2023 bluhm

Rename all in6p local variables to inp.

There exists no struct in6pcb in OpenBSD, this was an old kame idea.
Calling the local variable in6p does not make sense, it is actually
a struct inpcb. Also in6p is not used consistently in inet6 code.
Having the same convention for IPv4 and IPv6 is less confusing.

OK sashan@ mvs@


# 1.129 03-Dec-2023 bluhm

Use INP_IPV6 flag instead of sotopf().

During initialization in_pcballoc() sets INP_IPV6 once to avoid
reaching through inp_socket->so_proto->pr_domain->dom_family. Use
this flag consistently.

OK sashan@ mvs@


# 1.128 01-Dec-2023 bluhm

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@


# 1.127 01-Dec-2023 bluhm

Make internet PCB connect more consistent.

The public interface is in_pcbconnect(). It dispatches to
in6_pcbconnect() if necessary. Call the former from tcp_connect()
and udp_connect().
In in6_pcbconnect() initialization in6a = NULL is not necessary.
in6_pcbselsrc() sets the pointer, but does not read the value.
Pass a constant in6_addr pointer to in6_pcbselsrc() and in6_selectsrc().
It returns a reference to the address of some internal data structure.
We want to be sure that in6_addr is not modified this way. IPv4
in_pcbselsrc() solves this by passing a copy of the address.

OK kn@ sashan@ mvs@


# 1.126 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.125 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


Revision tags: OPENBSD_7_4_BASE
# 1.124 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.123 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.122 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.121 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.120 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.119 08-Aug-2022 bluhm

To make protocol input functions MP safe, internet PCB need protection.
Use their reference counter in more places.
The in_pcb lookup functions hold the PCBs in hash tables protected
by table->inpt_mtx mutex. Whenever a result is returned, increment
the ref count before releasing the mutex. Then the inp can be used
as long as neccessary. Unref it at the end of all functions that
call in_pcb lookup.
As a shortcut, pf may also hold a reference to the PCB. When
pf_inp_lookup() returns it, it also incements the ref count and the
caller can handle it like the inp from table lookup.
OK sashan@


# 1.118 06-Aug-2022 bluhm

Clean up the netlock macros. Merge NET_RLOCK_IN_SOFTNET and
NET_RLOCK_IN_IOCTL, which have the same implementation. The R and
W are hard to see, call the new macro NET_LOCK_SHARED. Rename the
opposite assertion from NET_ASSERT_WLOCKED to NET_ASSERT_LOCKED_EXCLUSIVE.
Update some outdated comments about net locking.
OK mpi@ mvs@


# 1.117 14-Apr-2022 claudio

Relax address availability check for multicast binds.

While it makes sense to limit bind(2) of unicast addresses that overlap
each other to be all from the same UID (like 0.0.0.0:53 and 127.0.0.1:53)
it makes little sense for multicast. Multicast is delivered to all sockets
that match so there is no risk of someone stealing traffic from someone
else. This should hopefully help with mDNS as reported by robert@
OK deraadt@ bluhm@


Revision tags: OPENBSD_7_1_BASE
# 1.116 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.115 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.114 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.113 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.112 11-Feb-2021 patrick

Swap faddr/laddr and fport/lport arguments in call to stoeplitz_ipXport().
Technically the whole point of the stoeplitz API is that it's symmetric,
meaning that the order of addresses and ports doesn't matter and will produce
the same hash value.

Coverity CID 1501717
ok dlg@


# 1.111 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.110 29-Nov-2019 nayden

add __func__ to panic() and printf() calls in sys/netinet6/*
ok benno@ mortimer@


# 1.109 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.108 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.142 22-Mar-2024 bluhm

Make local port which is bound during connect(2) unique per laddr.

in_pcbconnect() did not pass down the address it got from in_pcbselsrc()
to in_pcbpickport(). As a consequence local port numbers selected
during connect(2) were globally unique although they belong to
different addresses. This strict uniqueness is not necessary and
wastes usable ports for outgoing connections.

To solve this, pass ina from in_pcbconnect() to in_pcbbind_locked().
This does not interfere how wildcard sockets are matched with
specific sockets during bind(2). It only allows non-wildcard sockets
to share a local port during connect(2).

OK mvs@ deraadt@


Revision tags: OPENBSD_7_5_BASE
# 1.141 29-Feb-2024 naddy

revert "Combine route_cache() and rtalloc_mpath() in new route_mpath()"

It breaks NFS.

ok claudio@


# 1.140 27-Feb-2024 bluhm

Combine route_cache() and rtalloc_mpath() in new route_mpath().

Fill and check the cache and call rtalloc_mpath() together. Then
the caller of route_mpath() does not have to care about the uint32_t
*src pointer and just pass struct in_addr. All the conversions are
done inside the functions. ro->ro_rt is either valid or NULL. Note
that some places have a stricter rtisvalid() now compared to the
previous NULL check.

OK claudio@


# 1.139 22-Feb-2024 bluhm

Make the route cache aware of multipath routing.

Pass source address to route_cache() and store it in struct route.
Cached multipath routes are only valid if source address matches.
If sysctl multipath changes, increase route generation number.

OK claudio@


# 1.138 13-Feb-2024 bluhm

Merge struct route and struct route_in6.

Use a common struct route for both inet and inet6. Unfortunately
struct sockaddr is shorter than sockaddr_in6, so netinet/in.h has
to be exposed from net/route.h. Struct route has to be bsd visible
for userland as netstat kvm code inspects inp_route. Internet PCB
and TCP SYN cache can use a plain struct route now. All specific
sockaddr types for inet and inet6 are embeded there.

OK claudio@


# 1.137 11-Feb-2024 bluhm

Remove include netinet6/ip6_var.h from netinet/in_pcb.h.

OK mvs@


# 1.136 09-Feb-2024 bluhm

Route cache function returns hit or miss.

The route_cache() function can easily return whether it was a cache
hit or miss. Then the logic to perform a route lookup gets a bit
simpler. Some more complicated if (ro->ro_rt == NULL) checks still
exist elsewhere.
Also use route cache in in_pcbselsrc() instead of filling struct
route manually.

OK claudio@


# 1.135 07-Feb-2024 bluhm

Use the route generation number also for IPv6.

Implement route6_cache() to check whether the cached route is still
valid and otherwise fill caching parameter of struct route_in6.
Also count cache hits and misses in netstat. in_pcbrtentry() uses
route cache now.

OK claudio@


# 1.134 31-Jan-2024 bluhm

Split in_pcbrtentry() and in6_pcbrtentry() based on INP_IPV6.

Splitting the IPv6 code into a separate function results in less
#ifdef INET6. Also struct route_in6 *ro in in6_pcbrtentry() is of
the correct type and in_pcbrtentry() does not rely on the fact that
inp_route and inp_route6 are pointers to the same union.

OK kn@ claudio@


# 1.133 28-Jan-2024 bluhm

Use more specific sockaddr type for inpcb notify.

in_pcbnotifyall() is an IPv4 only function. All callers check that
sockaddr dst is in fact a sockaddr_in. Pass the more spcific type
and remove the runtime check at beginning of in_pcbnotifyall().
Use const sockaddr_in in in_pcbnotifyall() and const sockaddr_in6
in6_pcbnotify() as dst parameter.

OK millert@


# 1.132 09-Jan-2024 bluhm

Convert some struct inpcb parameter to const pointer.

OK millert@


# 1.131 07-Dec-2023 bluhm

Inpcb table mutex protects addr and port during bind(2) and connect(2).

in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() have to set
addresses and ports within the same critical section as the inpcb
hash table calculation. Also lookup and address selection have to
be protected to avoid bindings and connections that are not unique.

For that in_pcbpickport() and in_pcbbind_locked() expect that the
table mutex is already taken. The functions in_pcblookup_lock(),
in_pcblookup_local_lock(), and in_pcbaddrisavail_lock() grab the
mutex iff the lock parameter is IN_PCBLOCK_GRAB. Otherwise the
parameter is IN_PCBLOCK_HOLD has the lock has to be taken already.
Note that in_pcblookup_lock() and in_pcblookup_local() return an
inp with increased reference iff they take and release the lock.
Otherwise the caller protects the life time of the inp.

This gives enough flexibility that in_pcbbind() and in_pcbconnect()
can hold the table mutex when they need it. The public inpcb API
does not change.

OK sashan@ mvs@


# 1.130 03-Dec-2023 bluhm

Rename all in6p local variables to inp.

There exists no struct in6pcb in OpenBSD, this was an old kame idea.
Calling the local variable in6p does not make sense, it is actually
a struct inpcb. Also in6p is not used consistently in inet6 code.
Having the same convention for IPv4 and IPv6 is less confusing.

OK sashan@ mvs@


# 1.129 03-Dec-2023 bluhm

Use INP_IPV6 flag instead of sotopf().

During initialization in_pcballoc() sets INP_IPV6 once to avoid
reaching through inp_socket->so_proto->pr_domain->dom_family. Use
this flag consistently.

OK sashan@ mvs@


# 1.128 01-Dec-2023 bluhm

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@


# 1.127 01-Dec-2023 bluhm

Make internet PCB connect more consistent.

The public interface is in_pcbconnect(). It dispatches to
in6_pcbconnect() if necessary. Call the former from tcp_connect()
and udp_connect().
In in6_pcbconnect() initialization in6a = NULL is not necessary.
in6_pcbselsrc() sets the pointer, but does not read the value.
Pass a constant in6_addr pointer to in6_pcbselsrc() and in6_selectsrc().
It returns a reference to the address of some internal data structure.
We want to be sure that in6_addr is not modified this way. IPv4
in_pcbselsrc() solves this by passing a copy of the address.

OK kn@ sashan@ mvs@


# 1.126 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.125 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


Revision tags: OPENBSD_7_4_BASE
# 1.124 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.123 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.122 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.121 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.120 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.119 08-Aug-2022 bluhm

To make protocol input functions MP safe, internet PCB need protection.
Use their reference counter in more places.
The in_pcb lookup functions hold the PCBs in hash tables protected
by table->inpt_mtx mutex. Whenever a result is returned, increment
the ref count before releasing the mutex. Then the inp can be used
as long as neccessary. Unref it at the end of all functions that
call in_pcb lookup.
As a shortcut, pf may also hold a reference to the PCB. When
pf_inp_lookup() returns it, it also incements the ref count and the
caller can handle it like the inp from table lookup.
OK sashan@


# 1.118 06-Aug-2022 bluhm

Clean up the netlock macros. Merge NET_RLOCK_IN_SOFTNET and
NET_RLOCK_IN_IOCTL, which have the same implementation. The R and
W are hard to see, call the new macro NET_LOCK_SHARED. Rename the
opposite assertion from NET_ASSERT_WLOCKED to NET_ASSERT_LOCKED_EXCLUSIVE.
Update some outdated comments about net locking.
OK mpi@ mvs@


# 1.117 14-Apr-2022 claudio

Relax address availability check for multicast binds.

While it makes sense to limit bind(2) of unicast addresses that overlap
each other to be all from the same UID (like 0.0.0.0:53 and 127.0.0.1:53)
it makes little sense for multicast. Multicast is delivered to all sockets
that match so there is no risk of someone stealing traffic from someone
else. This should hopefully help with mDNS as reported by robert@
OK deraadt@ bluhm@


Revision tags: OPENBSD_7_1_BASE
# 1.116 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.115 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.114 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.113 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.112 11-Feb-2021 patrick

Swap faddr/laddr and fport/lport arguments in call to stoeplitz_ipXport().
Technically the whole point of the stoeplitz API is that it's symmetric,
meaning that the order of addresses and ports doesn't matter and will produce
the same hash value.

Coverity CID 1501717
ok dlg@


# 1.111 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.110 29-Nov-2019 nayden

add __func__ to panic() and printf() calls in sys/netinet6/*
ok benno@ mortimer@


# 1.109 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.108 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.141 29-Feb-2024 naddy

revert "Combine route_cache() and rtalloc_mpath() in new route_mpath()"

It breaks NFS.

ok claudio@


# 1.140 27-Feb-2024 bluhm

Combine route_cache() and rtalloc_mpath() in new route_mpath().

Fill and check the cache and call rtalloc_mpath() together. Then
the caller of route_mpath() does not have to care about the uint32_t
*src pointer and just pass struct in_addr. All the conversions are
done inside the functions. ro->ro_rt is either valid or NULL. Note
that some places have a stricter rtisvalid() now compared to the
previous NULL check.

OK claudio@


# 1.139 22-Feb-2024 bluhm

Make the route cache aware of multipath routing.

Pass source address to route_cache() and store it in struct route.
Cached multipath routes are only valid if source address matches.
If sysctl multipath changes, increase route generation number.

OK claudio@


# 1.138 13-Feb-2024 bluhm

Merge struct route and struct route_in6.

Use a common struct route for both inet and inet6. Unfortunately
struct sockaddr is shorter than sockaddr_in6, so netinet/in.h has
to be exposed from net/route.h. Struct route has to be bsd visible
for userland as netstat kvm code inspects inp_route. Internet PCB
and TCP SYN cache can use a plain struct route now. All specific
sockaddr types for inet and inet6 are embeded there.

OK claudio@


# 1.137 11-Feb-2024 bluhm

Remove include netinet6/ip6_var.h from netinet/in_pcb.h.

OK mvs@


# 1.136 09-Feb-2024 bluhm

Route cache function returns hit or miss.

The route_cache() function can easily return whether it was a cache
hit or miss. Then the logic to perform a route lookup gets a bit
simpler. Some more complicated if (ro->ro_rt == NULL) checks still
exist elsewhere.
Also use route cache in in_pcbselsrc() instead of filling struct
route manually.

OK claudio@


# 1.135 07-Feb-2024 bluhm

Use the route generation number also for IPv6.

Implement route6_cache() to check whether the cached route is still
valid and otherwise fill caching parameter of struct route_in6.
Also count cache hits and misses in netstat. in_pcbrtentry() uses
route cache now.

OK claudio@


# 1.134 31-Jan-2024 bluhm

Split in_pcbrtentry() and in6_pcbrtentry() based on INP_IPV6.

Splitting the IPv6 code into a separate function results in less
#ifdef INET6. Also struct route_in6 *ro in in6_pcbrtentry() is of
the correct type and in_pcbrtentry() does not rely on the fact that
inp_route and inp_route6 are pointers to the same union.

OK kn@ claudio@


# 1.133 28-Jan-2024 bluhm

Use more specific sockaddr type for inpcb notify.

in_pcbnotifyall() is an IPv4 only function. All callers check that
sockaddr dst is in fact a sockaddr_in. Pass the more spcific type
and remove the runtime check at beginning of in_pcbnotifyall().
Use const sockaddr_in in in_pcbnotifyall() and const sockaddr_in6
in6_pcbnotify() as dst parameter.

OK millert@


# 1.132 09-Jan-2024 bluhm

Convert some struct inpcb parameter to const pointer.

OK millert@


# 1.131 07-Dec-2023 bluhm

Inpcb table mutex protects addr and port during bind(2) and connect(2).

in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() have to set
addresses and ports within the same critical section as the inpcb
hash table calculation. Also lookup and address selection have to
be protected to avoid bindings and connections that are not unique.

For that in_pcbpickport() and in_pcbbind_locked() expect that the
table mutex is already taken. The functions in_pcblookup_lock(),
in_pcblookup_local_lock(), and in_pcbaddrisavail_lock() grab the
mutex iff the lock parameter is IN_PCBLOCK_GRAB. Otherwise the
parameter is IN_PCBLOCK_HOLD has the lock has to be taken already.
Note that in_pcblookup_lock() and in_pcblookup_local() return an
inp with increased reference iff they take and release the lock.
Otherwise the caller protects the life time of the inp.

This gives enough flexibility that in_pcbbind() and in_pcbconnect()
can hold the table mutex when they need it. The public inpcb API
does not change.

OK sashan@ mvs@


# 1.130 03-Dec-2023 bluhm

Rename all in6p local variables to inp.

There exists no struct in6pcb in OpenBSD, this was an old kame idea.
Calling the local variable in6p does not make sense, it is actually
a struct inpcb. Also in6p is not used consistently in inet6 code.
Having the same convention for IPv4 and IPv6 is less confusing.

OK sashan@ mvs@


# 1.129 03-Dec-2023 bluhm

Use INP_IPV6 flag instead of sotopf().

During initialization in_pcballoc() sets INP_IPV6 once to avoid
reaching through inp_socket->so_proto->pr_domain->dom_family. Use
this flag consistently.

OK sashan@ mvs@


# 1.128 01-Dec-2023 bluhm

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@


# 1.127 01-Dec-2023 bluhm

Make internet PCB connect more consistent.

The public interface is in_pcbconnect(). It dispatches to
in6_pcbconnect() if necessary. Call the former from tcp_connect()
and udp_connect().
In in6_pcbconnect() initialization in6a = NULL is not necessary.
in6_pcbselsrc() sets the pointer, but does not read the value.
Pass a constant in6_addr pointer to in6_pcbselsrc() and in6_selectsrc().
It returns a reference to the address of some internal data structure.
We want to be sure that in6_addr is not modified this way. IPv4
in_pcbselsrc() solves this by passing a copy of the address.

OK kn@ sashan@ mvs@


# 1.126 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.125 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


Revision tags: OPENBSD_7_4_BASE
# 1.124 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.123 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.122 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.121 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.120 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.119 08-Aug-2022 bluhm

To make protocol input functions MP safe, internet PCB need protection.
Use their reference counter in more places.
The in_pcb lookup functions hold the PCBs in hash tables protected
by table->inpt_mtx mutex. Whenever a result is returned, increment
the ref count before releasing the mutex. Then the inp can be used
as long as neccessary. Unref it at the end of all functions that
call in_pcb lookup.
As a shortcut, pf may also hold a reference to the PCB. When
pf_inp_lookup() returns it, it also incements the ref count and the
caller can handle it like the inp from table lookup.
OK sashan@


# 1.118 06-Aug-2022 bluhm

Clean up the netlock macros. Merge NET_RLOCK_IN_SOFTNET and
NET_RLOCK_IN_IOCTL, which have the same implementation. The R and
W are hard to see, call the new macro NET_LOCK_SHARED. Rename the
opposite assertion from NET_ASSERT_WLOCKED to NET_ASSERT_LOCKED_EXCLUSIVE.
Update some outdated comments about net locking.
OK mpi@ mvs@


# 1.117 14-Apr-2022 claudio

Relax address availability check for multicast binds.

While it makes sense to limit bind(2) of unicast addresses that overlap
each other to be all from the same UID (like 0.0.0.0:53 and 127.0.0.1:53)
it makes little sense for multicast. Multicast is delivered to all sockets
that match so there is no risk of someone stealing traffic from someone
else. This should hopefully help with mDNS as reported by robert@
OK deraadt@ bluhm@


Revision tags: OPENBSD_7_1_BASE
# 1.116 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.115 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.114 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.113 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.112 11-Feb-2021 patrick

Swap faddr/laddr and fport/lport arguments in call to stoeplitz_ipXport().
Technically the whole point of the stoeplitz API is that it's symmetric,
meaning that the order of addresses and ports doesn't matter and will produce
the same hash value.

Coverity CID 1501717
ok dlg@


# 1.111 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.110 29-Nov-2019 nayden

add __func__ to panic() and printf() calls in sys/netinet6/*
ok benno@ mortimer@


# 1.109 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.108 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.140 27-Feb-2024 bluhm

Combine route_cache() and rtalloc_mpath() in new route_mpath().

Fill and check the cache and call rtalloc_mpath() together. Then
the caller of route_mpath() does not have to care about the uint32_t
*src pointer and just pass struct in_addr. All the conversions are
done inside the functions. ro->ro_rt is either valid or NULL. Note
that some places have a stricter rtisvalid() now compared to the
previous NULL check.

OK claudio@


# 1.139 22-Feb-2024 bluhm

Make the route cache aware of multipath routing.

Pass source address to route_cache() and store it in struct route.
Cached multipath routes are only valid if source address matches.
If sysctl multipath changes, increase route generation number.

OK claudio@


# 1.138 13-Feb-2024 bluhm

Merge struct route and struct route_in6.

Use a common struct route for both inet and inet6. Unfortunately
struct sockaddr is shorter than sockaddr_in6, so netinet/in.h has
to be exposed from net/route.h. Struct route has to be bsd visible
for userland as netstat kvm code inspects inp_route. Internet PCB
and TCP SYN cache can use a plain struct route now. All specific
sockaddr types for inet and inet6 are embeded there.

OK claudio@


# 1.137 11-Feb-2024 bluhm

Remove include netinet6/ip6_var.h from netinet/in_pcb.h.

OK mvs@


# 1.136 09-Feb-2024 bluhm

Route cache function returns hit or miss.

The route_cache() function can easily return whether it was a cache
hit or miss. Then the logic to perform a route lookup gets a bit
simpler. Some more complicated if (ro->ro_rt == NULL) checks still
exist elsewhere.
Also use route cache in in_pcbselsrc() instead of filling struct
route manually.

OK claudio@


# 1.135 07-Feb-2024 bluhm

Use the route generation number also for IPv6.

Implement route6_cache() to check whether the cached route is still
valid and otherwise fill caching parameter of struct route_in6.
Also count cache hits and misses in netstat. in_pcbrtentry() uses
route cache now.

OK claudio@


# 1.134 31-Jan-2024 bluhm

Split in_pcbrtentry() and in6_pcbrtentry() based on INP_IPV6.

Splitting the IPv6 code into a separate function results in less
#ifdef INET6. Also struct route_in6 *ro in in6_pcbrtentry() is of
the correct type and in_pcbrtentry() does not rely on the fact that
inp_route and inp_route6 are pointers to the same union.

OK kn@ claudio@


# 1.133 28-Jan-2024 bluhm

Use more specific sockaddr type for inpcb notify.

in_pcbnotifyall() is an IPv4 only function. All callers check that
sockaddr dst is in fact a sockaddr_in. Pass the more spcific type
and remove the runtime check at beginning of in_pcbnotifyall().
Use const sockaddr_in in in_pcbnotifyall() and const sockaddr_in6
in6_pcbnotify() as dst parameter.

OK millert@


# 1.132 09-Jan-2024 bluhm

Convert some struct inpcb parameter to const pointer.

OK millert@


# 1.131 07-Dec-2023 bluhm

Inpcb table mutex protects addr and port during bind(2) and connect(2).

in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() have to set
addresses and ports within the same critical section as the inpcb
hash table calculation. Also lookup and address selection have to
be protected to avoid bindings and connections that are not unique.

For that in_pcbpickport() and in_pcbbind_locked() expect that the
table mutex is already taken. The functions in_pcblookup_lock(),
in_pcblookup_local_lock(), and in_pcbaddrisavail_lock() grab the
mutex iff the lock parameter is IN_PCBLOCK_GRAB. Otherwise the
parameter is IN_PCBLOCK_HOLD has the lock has to be taken already.
Note that in_pcblookup_lock() and in_pcblookup_local() return an
inp with increased reference iff they take and release the lock.
Otherwise the caller protects the life time of the inp.

This gives enough flexibility that in_pcbbind() and in_pcbconnect()
can hold the table mutex when they need it. The public inpcb API
does not change.

OK sashan@ mvs@


# 1.130 03-Dec-2023 bluhm

Rename all in6p local variables to inp.

There exists no struct in6pcb in OpenBSD, this was an old kame idea.
Calling the local variable in6p does not make sense, it is actually
a struct inpcb. Also in6p is not used consistently in inet6 code.
Having the same convention for IPv4 and IPv6 is less confusing.

OK sashan@ mvs@


# 1.129 03-Dec-2023 bluhm

Use INP_IPV6 flag instead of sotopf().

During initialization in_pcballoc() sets INP_IPV6 once to avoid
reaching through inp_socket->so_proto->pr_domain->dom_family. Use
this flag consistently.

OK sashan@ mvs@


# 1.128 01-Dec-2023 bluhm

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@


# 1.127 01-Dec-2023 bluhm

Make internet PCB connect more consistent.

The public interface is in_pcbconnect(). It dispatches to
in6_pcbconnect() if necessary. Call the former from tcp_connect()
and udp_connect().
In in6_pcbconnect() initialization in6a = NULL is not necessary.
in6_pcbselsrc() sets the pointer, but does not read the value.
Pass a constant in6_addr pointer to in6_pcbselsrc() and in6_selectsrc().
It returns a reference to the address of some internal data structure.
We want to be sure that in6_addr is not modified this way. IPv4
in_pcbselsrc() solves this by passing a copy of the address.

OK kn@ sashan@ mvs@


# 1.126 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.125 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


Revision tags: OPENBSD_7_4_BASE
# 1.124 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.123 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.122 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.121 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.120 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.119 08-Aug-2022 bluhm

To make protocol input functions MP safe, internet PCB need protection.
Use their reference counter in more places.
The in_pcb lookup functions hold the PCBs in hash tables protected
by table->inpt_mtx mutex. Whenever a result is returned, increment
the ref count before releasing the mutex. Then the inp can be used
as long as neccessary. Unref it at the end of all functions that
call in_pcb lookup.
As a shortcut, pf may also hold a reference to the PCB. When
pf_inp_lookup() returns it, it also incements the ref count and the
caller can handle it like the inp from table lookup.
OK sashan@


# 1.118 06-Aug-2022 bluhm

Clean up the netlock macros. Merge NET_RLOCK_IN_SOFTNET and
NET_RLOCK_IN_IOCTL, which have the same implementation. The R and
W are hard to see, call the new macro NET_LOCK_SHARED. Rename the
opposite assertion from NET_ASSERT_WLOCKED to NET_ASSERT_LOCKED_EXCLUSIVE.
Update some outdated comments about net locking.
OK mpi@ mvs@


# 1.117 14-Apr-2022 claudio

Relax address availability check for multicast binds.

While it makes sense to limit bind(2) of unicast addresses that overlap
each other to be all from the same UID (like 0.0.0.0:53 and 127.0.0.1:53)
it makes little sense for multicast. Multicast is delivered to all sockets
that match so there is no risk of someone stealing traffic from someone
else. This should hopefully help with mDNS as reported by robert@
OK deraadt@ bluhm@


Revision tags: OPENBSD_7_1_BASE
# 1.116 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.115 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.114 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.113 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.112 11-Feb-2021 patrick

Swap faddr/laddr and fport/lport arguments in call to stoeplitz_ipXport().
Technically the whole point of the stoeplitz API is that it's symmetric,
meaning that the order of addresses and ports doesn't matter and will produce
the same hash value.

Coverity CID 1501717
ok dlg@


# 1.111 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.110 29-Nov-2019 nayden

add __func__ to panic() and printf() calls in sys/netinet6/*
ok benno@ mortimer@


# 1.109 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.108 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.139 22-Feb-2024 bluhm

Make the route cache aware of multipath routing.

Pass source address to route_cache() and store it in struct route.
Cached multipath routes are only valid if source address matches.
If sysctl multipath changes, increase route generation number.

OK claudio@


# 1.138 13-Feb-2024 bluhm

Merge struct route and struct route_in6.

Use a common struct route for both inet and inet6. Unfortunately
struct sockaddr is shorter than sockaddr_in6, so netinet/in.h has
to be exposed from net/route.h. Struct route has to be bsd visible
for userland as netstat kvm code inspects inp_route. Internet PCB
and TCP SYN cache can use a plain struct route now. All specific
sockaddr types for inet and inet6 are embeded there.

OK claudio@


# 1.137 11-Feb-2024 bluhm

Remove include netinet6/ip6_var.h from netinet/in_pcb.h.

OK mvs@


# 1.136 09-Feb-2024 bluhm

Route cache function returns hit or miss.

The route_cache() function can easily return whether it was a cache
hit or miss. Then the logic to perform a route lookup gets a bit
simpler. Some more complicated if (ro->ro_rt == NULL) checks still
exist elsewhere.
Also use route cache in in_pcbselsrc() instead of filling struct
route manually.

OK claudio@


# 1.135 07-Feb-2024 bluhm

Use the route generation number also for IPv6.

Implement route6_cache() to check whether the cached route is still
valid and otherwise fill caching parameter of struct route_in6.
Also count cache hits and misses in netstat. in_pcbrtentry() uses
route cache now.

OK claudio@


# 1.134 31-Jan-2024 bluhm

Split in_pcbrtentry() and in6_pcbrtentry() based on INP_IPV6.

Splitting the IPv6 code into a separate function results in less
#ifdef INET6. Also struct route_in6 *ro in in6_pcbrtentry() is of
the correct type and in_pcbrtentry() does not rely on the fact that
inp_route and inp_route6 are pointers to the same union.

OK kn@ claudio@


# 1.133 28-Jan-2024 bluhm

Use more specific sockaddr type for inpcb notify.

in_pcbnotifyall() is an IPv4 only function. All callers check that
sockaddr dst is in fact a sockaddr_in. Pass the more spcific type
and remove the runtime check at beginning of in_pcbnotifyall().
Use const sockaddr_in in in_pcbnotifyall() and const sockaddr_in6
in6_pcbnotify() as dst parameter.

OK millert@


# 1.132 09-Jan-2024 bluhm

Convert some struct inpcb parameter to const pointer.

OK millert@


# 1.131 07-Dec-2023 bluhm

Inpcb table mutex protects addr and port during bind(2) and connect(2).

in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() have to set
addresses and ports within the same critical section as the inpcb
hash table calculation. Also lookup and address selection have to
be protected to avoid bindings and connections that are not unique.

For that in_pcbpickport() and in_pcbbind_locked() expect that the
table mutex is already taken. The functions in_pcblookup_lock(),
in_pcblookup_local_lock(), and in_pcbaddrisavail_lock() grab the
mutex iff the lock parameter is IN_PCBLOCK_GRAB. Otherwise the
parameter is IN_PCBLOCK_HOLD has the lock has to be taken already.
Note that in_pcblookup_lock() and in_pcblookup_local() return an
inp with increased reference iff they take and release the lock.
Otherwise the caller protects the life time of the inp.

This gives enough flexibility that in_pcbbind() and in_pcbconnect()
can hold the table mutex when they need it. The public inpcb API
does not change.

OK sashan@ mvs@


# 1.130 03-Dec-2023 bluhm

Rename all in6p local variables to inp.

There exists no struct in6pcb in OpenBSD, this was an old kame idea.
Calling the local variable in6p does not make sense, it is actually
a struct inpcb. Also in6p is not used consistently in inet6 code.
Having the same convention for IPv4 and IPv6 is less confusing.

OK sashan@ mvs@


# 1.129 03-Dec-2023 bluhm

Use INP_IPV6 flag instead of sotopf().

During initialization in_pcballoc() sets INP_IPV6 once to avoid
reaching through inp_socket->so_proto->pr_domain->dom_family. Use
this flag consistently.

OK sashan@ mvs@


# 1.128 01-Dec-2023 bluhm

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@


# 1.127 01-Dec-2023 bluhm

Make internet PCB connect more consistent.

The public interface is in_pcbconnect(). It dispatches to
in6_pcbconnect() if necessary. Call the former from tcp_connect()
and udp_connect().
In in6_pcbconnect() initialization in6a = NULL is not necessary.
in6_pcbselsrc() sets the pointer, but does not read the value.
Pass a constant in6_addr pointer to in6_pcbselsrc() and in6_selectsrc().
It returns a reference to the address of some internal data structure.
We want to be sure that in6_addr is not modified this way. IPv4
in_pcbselsrc() solves this by passing a copy of the address.

OK kn@ sashan@ mvs@


# 1.126 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.125 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


Revision tags: OPENBSD_7_4_BASE
# 1.124 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.123 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.122 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.121 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.120 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.119 08-Aug-2022 bluhm

To make protocol input functions MP safe, internet PCB need protection.
Use their reference counter in more places.
The in_pcb lookup functions hold the PCBs in hash tables protected
by table->inpt_mtx mutex. Whenever a result is returned, increment
the ref count before releasing the mutex. Then the inp can be used
as long as neccessary. Unref it at the end of all functions that
call in_pcb lookup.
As a shortcut, pf may also hold a reference to the PCB. When
pf_inp_lookup() returns it, it also incements the ref count and the
caller can handle it like the inp from table lookup.
OK sashan@


# 1.118 06-Aug-2022 bluhm

Clean up the netlock macros. Merge NET_RLOCK_IN_SOFTNET and
NET_RLOCK_IN_IOCTL, which have the same implementation. The R and
W are hard to see, call the new macro NET_LOCK_SHARED. Rename the
opposite assertion from NET_ASSERT_WLOCKED to NET_ASSERT_LOCKED_EXCLUSIVE.
Update some outdated comments about net locking.
OK mpi@ mvs@


# 1.117 14-Apr-2022 claudio

Relax address availability check for multicast binds.

While it makes sense to limit bind(2) of unicast addresses that overlap
each other to be all from the same UID (like 0.0.0.0:53 and 127.0.0.1:53)
it makes little sense for multicast. Multicast is delivered to all sockets
that match so there is no risk of someone stealing traffic from someone
else. This should hopefully help with mDNS as reported by robert@
OK deraadt@ bluhm@


Revision tags: OPENBSD_7_1_BASE
# 1.116 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.115 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.114 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.113 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.112 11-Feb-2021 patrick

Swap faddr/laddr and fport/lport arguments in call to stoeplitz_ipXport().
Technically the whole point of the stoeplitz API is that it's symmetric,
meaning that the order of addresses and ports doesn't matter and will produce
the same hash value.

Coverity CID 1501717
ok dlg@


# 1.111 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.110 29-Nov-2019 nayden

add __func__ to panic() and printf() calls in sys/netinet6/*
ok benno@ mortimer@


# 1.109 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.108 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.138 13-Feb-2024 bluhm

Merge struct route and struct route_in6.

Use a common struct route for both inet and inet6. Unfortunately
struct sockaddr is shorter than sockaddr_in6, so netinet/in.h has
to be exposed from net/route.h. Struct route has to be bsd visible
for userland as netstat kvm code inspects inp_route. Internet PCB
and TCP SYN cache can use a plain struct route now. All specific
sockaddr types for inet and inet6 are embeded there.

OK claudio@


# 1.137 11-Feb-2024 bluhm

Remove include netinet6/ip6_var.h from netinet/in_pcb.h.

OK mvs@


# 1.136 09-Feb-2024 bluhm

Route cache function returns hit or miss.

The route_cache() function can easily return whether it was a cache
hit or miss. Then the logic to perform a route lookup gets a bit
simpler. Some more complicated if (ro->ro_rt == NULL) checks still
exist elsewhere.
Also use route cache in in_pcbselsrc() instead of filling struct
route manually.

OK claudio@


# 1.135 07-Feb-2024 bluhm

Use the route generation number also for IPv6.

Implement route6_cache() to check whether the cached route is still
valid and otherwise fill caching parameter of struct route_in6.
Also count cache hits and misses in netstat. in_pcbrtentry() uses
route cache now.

OK claudio@


# 1.134 31-Jan-2024 bluhm

Split in_pcbrtentry() and in6_pcbrtentry() based on INP_IPV6.

Splitting the IPv6 code into a separate function results in less
#ifdef INET6. Also struct route_in6 *ro in in6_pcbrtentry() is of
the correct type and in_pcbrtentry() does not rely on the fact that
inp_route and inp_route6 are pointers to the same union.

OK kn@ claudio@


# 1.133 28-Jan-2024 bluhm

Use more specific sockaddr type for inpcb notify.

in_pcbnotifyall() is an IPv4 only function. All callers check that
sockaddr dst is in fact a sockaddr_in. Pass the more spcific type
and remove the runtime check at beginning of in_pcbnotifyall().
Use const sockaddr_in in in_pcbnotifyall() and const sockaddr_in6
in6_pcbnotify() as dst parameter.

OK millert@


# 1.132 09-Jan-2024 bluhm

Convert some struct inpcb parameter to const pointer.

OK millert@


# 1.131 07-Dec-2023 bluhm

Inpcb table mutex protects addr and port during bind(2) and connect(2).

in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() have to set
addresses and ports within the same critical section as the inpcb
hash table calculation. Also lookup and address selection have to
be protected to avoid bindings and connections that are not unique.

For that in_pcbpickport() and in_pcbbind_locked() expect that the
table mutex is already taken. The functions in_pcblookup_lock(),
in_pcblookup_local_lock(), and in_pcbaddrisavail_lock() grab the
mutex iff the lock parameter is IN_PCBLOCK_GRAB. Otherwise the
parameter is IN_PCBLOCK_HOLD has the lock has to be taken already.
Note that in_pcblookup_lock() and in_pcblookup_local() return an
inp with increased reference iff they take and release the lock.
Otherwise the caller protects the life time of the inp.

This gives enough flexibility that in_pcbbind() and in_pcbconnect()
can hold the table mutex when they need it. The public inpcb API
does not change.

OK sashan@ mvs@


# 1.130 03-Dec-2023 bluhm

Rename all in6p local variables to inp.

There exists no struct in6pcb in OpenBSD, this was an old kame idea.
Calling the local variable in6p does not make sense, it is actually
a struct inpcb. Also in6p is not used consistently in inet6 code.
Having the same convention for IPv4 and IPv6 is less confusing.

OK sashan@ mvs@


# 1.129 03-Dec-2023 bluhm

Use INP_IPV6 flag instead of sotopf().

During initialization in_pcballoc() sets INP_IPV6 once to avoid
reaching through inp_socket->so_proto->pr_domain->dom_family. Use
this flag consistently.

OK sashan@ mvs@


# 1.128 01-Dec-2023 bluhm

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@


# 1.127 01-Dec-2023 bluhm

Make internet PCB connect more consistent.

The public interface is in_pcbconnect(). It dispatches to
in6_pcbconnect() if necessary. Call the former from tcp_connect()
and udp_connect().
In in6_pcbconnect() initialization in6a = NULL is not necessary.
in6_pcbselsrc() sets the pointer, but does not read the value.
Pass a constant in6_addr pointer to in6_pcbselsrc() and in6_selectsrc().
It returns a reference to the address of some internal data structure.
We want to be sure that in6_addr is not modified this way. IPv4
in_pcbselsrc() solves this by passing a copy of the address.

OK kn@ sashan@ mvs@


# 1.126 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.125 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


Revision tags: OPENBSD_7_4_BASE
# 1.124 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.123 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.122 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.121 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.120 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.119 08-Aug-2022 bluhm

To make protocol input functions MP safe, internet PCB need protection.
Use their reference counter in more places.
The in_pcb lookup functions hold the PCBs in hash tables protected
by table->inpt_mtx mutex. Whenever a result is returned, increment
the ref count before releasing the mutex. Then the inp can be used
as long as neccessary. Unref it at the end of all functions that
call in_pcb lookup.
As a shortcut, pf may also hold a reference to the PCB. When
pf_inp_lookup() returns it, it also incements the ref count and the
caller can handle it like the inp from table lookup.
OK sashan@


# 1.118 06-Aug-2022 bluhm

Clean up the netlock macros. Merge NET_RLOCK_IN_SOFTNET and
NET_RLOCK_IN_IOCTL, which have the same implementation. The R and
W are hard to see, call the new macro NET_LOCK_SHARED. Rename the
opposite assertion from NET_ASSERT_WLOCKED to NET_ASSERT_LOCKED_EXCLUSIVE.
Update some outdated comments about net locking.
OK mpi@ mvs@


# 1.117 14-Apr-2022 claudio

Relax address availability check for multicast binds.

While it makes sense to limit bind(2) of unicast addresses that overlap
each other to be all from the same UID (like 0.0.0.0:53 and 127.0.0.1:53)
it makes little sense for multicast. Multicast is delivered to all sockets
that match so there is no risk of someone stealing traffic from someone
else. This should hopefully help with mDNS as reported by robert@
OK deraadt@ bluhm@


Revision tags: OPENBSD_7_1_BASE
# 1.116 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.115 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.114 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.113 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.112 11-Feb-2021 patrick

Swap faddr/laddr and fport/lport arguments in call to stoeplitz_ipXport().
Technically the whole point of the stoeplitz API is that it's symmetric,
meaning that the order of addresses and ports doesn't matter and will produce
the same hash value.

Coverity CID 1501717
ok dlg@


# 1.111 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.110 29-Nov-2019 nayden

add __func__ to panic() and printf() calls in sys/netinet6/*
ok benno@ mortimer@


# 1.109 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.108 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.137 11-Feb-2024 bluhm

Remove include netinet6/ip6_var.h from netinet/in_pcb.h.

OK mvs@


# 1.136 09-Feb-2024 bluhm

Route cache function returns hit or miss.

The route_cache() function can easily return whether it was a cache
hit or miss. Then the logic to perform a route lookup gets a bit
simpler. Some more complicated if (ro->ro_rt == NULL) checks still
exist elsewhere.
Also use route cache in in_pcbselsrc() instead of filling struct
route manually.

OK claudio@


# 1.135 07-Feb-2024 bluhm

Use the route generation number also for IPv6.

Implement route6_cache() to check whether the cached route is still
valid and otherwise fill caching parameter of struct route_in6.
Also count cache hits and misses in netstat. in_pcbrtentry() uses
route cache now.

OK claudio@


# 1.134 31-Jan-2024 bluhm

Split in_pcbrtentry() and in6_pcbrtentry() based on INP_IPV6.

Splitting the IPv6 code into a separate function results in less
#ifdef INET6. Also struct route_in6 *ro in in6_pcbrtentry() is of
the correct type and in_pcbrtentry() does not rely on the fact that
inp_route and inp_route6 are pointers to the same union.

OK kn@ claudio@


# 1.133 28-Jan-2024 bluhm

Use more specific sockaddr type for inpcb notify.

in_pcbnotifyall() is an IPv4 only function. All callers check that
sockaddr dst is in fact a sockaddr_in. Pass the more spcific type
and remove the runtime check at beginning of in_pcbnotifyall().
Use const sockaddr_in in in_pcbnotifyall() and const sockaddr_in6
in6_pcbnotify() as dst parameter.

OK millert@


# 1.132 09-Jan-2024 bluhm

Convert some struct inpcb parameter to const pointer.

OK millert@


# 1.131 07-Dec-2023 bluhm

Inpcb table mutex protects addr and port during bind(2) and connect(2).

in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() have to set
addresses and ports within the same critical section as the inpcb
hash table calculation. Also lookup and address selection have to
be protected to avoid bindings and connections that are not unique.

For that in_pcbpickport() and in_pcbbind_locked() expect that the
table mutex is already taken. The functions in_pcblookup_lock(),
in_pcblookup_local_lock(), and in_pcbaddrisavail_lock() grab the
mutex iff the lock parameter is IN_PCBLOCK_GRAB. Otherwise the
parameter is IN_PCBLOCK_HOLD has the lock has to be taken already.
Note that in_pcblookup_lock() and in_pcblookup_local() return an
inp with increased reference iff they take and release the lock.
Otherwise the caller protects the life time of the inp.

This gives enough flexibility that in_pcbbind() and in_pcbconnect()
can hold the table mutex when they need it. The public inpcb API
does not change.

OK sashan@ mvs@


# 1.130 03-Dec-2023 bluhm

Rename all in6p local variables to inp.

There exists no struct in6pcb in OpenBSD, this was an old kame idea.
Calling the local variable in6p does not make sense, it is actually
a struct inpcb. Also in6p is not used consistently in inet6 code.
Having the same convention for IPv4 and IPv6 is less confusing.

OK sashan@ mvs@


# 1.129 03-Dec-2023 bluhm

Use INP_IPV6 flag instead of sotopf().

During initialization in_pcballoc() sets INP_IPV6 once to avoid
reaching through inp_socket->so_proto->pr_domain->dom_family. Use
this flag consistently.

OK sashan@ mvs@


# 1.128 01-Dec-2023 bluhm

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@


# 1.127 01-Dec-2023 bluhm

Make internet PCB connect more consistent.

The public interface is in_pcbconnect(). It dispatches to
in6_pcbconnect() if necessary. Call the former from tcp_connect()
and udp_connect().
In in6_pcbconnect() initialization in6a = NULL is not necessary.
in6_pcbselsrc() sets the pointer, but does not read the value.
Pass a constant in6_addr pointer to in6_pcbselsrc() and in6_selectsrc().
It returns a reference to the address of some internal data structure.
We want to be sure that in6_addr is not modified this way. IPv4
in_pcbselsrc() solves this by passing a copy of the address.

OK kn@ sashan@ mvs@


# 1.126 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.125 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


Revision tags: OPENBSD_7_4_BASE
# 1.124 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.123 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.122 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.121 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.120 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.119 08-Aug-2022 bluhm

To make protocol input functions MP safe, internet PCB need protection.
Use their reference counter in more places.
The in_pcb lookup functions hold the PCBs in hash tables protected
by table->inpt_mtx mutex. Whenever a result is returned, increment
the ref count before releasing the mutex. Then the inp can be used
as long as neccessary. Unref it at the end of all functions that
call in_pcb lookup.
As a shortcut, pf may also hold a reference to the PCB. When
pf_inp_lookup() returns it, it also incements the ref count and the
caller can handle it like the inp from table lookup.
OK sashan@


# 1.118 06-Aug-2022 bluhm

Clean up the netlock macros. Merge NET_RLOCK_IN_SOFTNET and
NET_RLOCK_IN_IOCTL, which have the same implementation. The R and
W are hard to see, call the new macro NET_LOCK_SHARED. Rename the
opposite assertion from NET_ASSERT_WLOCKED to NET_ASSERT_LOCKED_EXCLUSIVE.
Update some outdated comments about net locking.
OK mpi@ mvs@


# 1.117 14-Apr-2022 claudio

Relax address availability check for multicast binds.

While it makes sense to limit bind(2) of unicast addresses that overlap
each other to be all from the same UID (like 0.0.0.0:53 and 127.0.0.1:53)
it makes little sense for multicast. Multicast is delivered to all sockets
that match so there is no risk of someone stealing traffic from someone
else. This should hopefully help with mDNS as reported by robert@
OK deraadt@ bluhm@


Revision tags: OPENBSD_7_1_BASE
# 1.116 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.115 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.114 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.113 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.112 11-Feb-2021 patrick

Swap faddr/laddr and fport/lport arguments in call to stoeplitz_ipXport().
Technically the whole point of the stoeplitz API is that it's symmetric,
meaning that the order of addresses and ports doesn't matter and will produce
the same hash value.

Coverity CID 1501717
ok dlg@


# 1.111 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.110 29-Nov-2019 nayden

add __func__ to panic() and printf() calls in sys/netinet6/*
ok benno@ mortimer@


# 1.109 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.108 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.137 11-Feb-2024 bluhm

Remove include netinet6/ip6_var.h from netinet/in_pcb.h.

OK mvs@


# 1.136 09-Feb-2024 bluhm

Route cache function returns hit or miss.

The route_cache() function can easily return whether it was a cache
hit or miss. Then the logic to perform a route lookup gets a bit
simpler. Some more complicated if (ro->ro_rt == NULL) checks still
exist elsewhere.
Also use route cache in in_pcbselsrc() instead of filling struct
route manually.

OK claudio@


# 1.135 07-Feb-2024 bluhm

Use the route generation number also for IPv6.

Implement route6_cache() to check whether the cached route is still
valid and otherwise fill caching parameter of struct route_in6.
Also count cache hits and misses in netstat. in_pcbrtentry() uses
route cache now.

OK claudio@


# 1.134 31-Jan-2024 bluhm

Split in_pcbrtentry() and in6_pcbrtentry() based on INP_IPV6.

Splitting the IPv6 code into a separate function results in less
#ifdef INET6. Also struct route_in6 *ro in in6_pcbrtentry() is of
the correct type and in_pcbrtentry() does not rely on the fact that
inp_route and inp_route6 are pointers to the same union.

OK kn@ claudio@


# 1.133 28-Jan-2024 bluhm

Use more specific sockaddr type for inpcb notify.

in_pcbnotifyall() is an IPv4 only function. All callers check that
sockaddr dst is in fact a sockaddr_in. Pass the more spcific type
and remove the runtime check at beginning of in_pcbnotifyall().
Use const sockaddr_in in in_pcbnotifyall() and const sockaddr_in6
in6_pcbnotify() as dst parameter.

OK millert@


# 1.132 09-Jan-2024 bluhm

Convert some struct inpcb parameter to const pointer.

OK millert@


# 1.131 07-Dec-2023 bluhm

Inpcb table mutex protects addr and port during bind(2) and connect(2).

in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() have to set
addresses and ports within the same critical section as the inpcb
hash table calculation. Also lookup and address selection have to
be protected to avoid bindings and connections that are not unique.

For that in_pcbpickport() and in_pcbbind_locked() expect that the
table mutex is already taken. The functions in_pcblookup_lock(),
in_pcblookup_local_lock(), and in_pcbaddrisavail_lock() grab the
mutex iff the lock parameter is IN_PCBLOCK_GRAB. Otherwise the
parameter is IN_PCBLOCK_HOLD has the lock has to be taken already.
Note that in_pcblookup_lock() and in_pcblookup_local() return an
inp with increased reference iff they take and release the lock.
Otherwise the caller protects the life time of the inp.

This gives enough flexibility that in_pcbbind() and in_pcbconnect()
can hold the table mutex when they need it. The public inpcb API
does not change.

OK sashan@ mvs@


# 1.130 03-Dec-2023 bluhm

Rename all in6p local variables to inp.

There exists no struct in6pcb in OpenBSD, this was an old kame idea.
Calling the local variable in6p does not make sense, it is actually
a struct inpcb. Also in6p is not used consistently in inet6 code.
Having the same convention for IPv4 and IPv6 is less confusing.

OK sashan@ mvs@


# 1.129 03-Dec-2023 bluhm

Use INP_IPV6 flag instead of sotopf().

During initialization in_pcballoc() sets INP_IPV6 once to avoid
reaching through inp_socket->so_proto->pr_domain->dom_family. Use
this flag consistently.

OK sashan@ mvs@


# 1.128 01-Dec-2023 bluhm

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@


# 1.127 01-Dec-2023 bluhm

Make internet PCB connect more consistent.

The public interface is in_pcbconnect(). It dispatches to
in6_pcbconnect() if necessary. Call the former from tcp_connect()
and udp_connect().
In in6_pcbconnect() initialization in6a = NULL is not necessary.
in6_pcbselsrc() sets the pointer, but does not read the value.
Pass a constant in6_addr pointer to in6_pcbselsrc() and in6_selectsrc().
It returns a reference to the address of some internal data structure.
We want to be sure that in6_addr is not modified this way. IPv4
in_pcbselsrc() solves this by passing a copy of the address.

OK kn@ sashan@ mvs@


# 1.126 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.125 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


Revision tags: OPENBSD_7_4_BASE
# 1.124 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.123 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.122 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.121 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.120 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.119 08-Aug-2022 bluhm

To make protocol input functions MP safe, internet PCB need protection.
Use their reference counter in more places.
The in_pcb lookup functions hold the PCBs in hash tables protected
by table->inpt_mtx mutex. Whenever a result is returned, increment
the ref count before releasing the mutex. Then the inp can be used
as long as neccessary. Unref it at the end of all functions that
call in_pcb lookup.
As a shortcut, pf may also hold a reference to the PCB. When
pf_inp_lookup() returns it, it also incements the ref count and the
caller can handle it like the inp from table lookup.
OK sashan@


# 1.118 06-Aug-2022 bluhm

Clean up the netlock macros. Merge NET_RLOCK_IN_SOFTNET and
NET_RLOCK_IN_IOCTL, which have the same implementation. The R and
W are hard to see, call the new macro NET_LOCK_SHARED. Rename the
opposite assertion from NET_ASSERT_WLOCKED to NET_ASSERT_LOCKED_EXCLUSIVE.
Update some outdated comments about net locking.
OK mpi@ mvs@


# 1.117 14-Apr-2022 claudio

Relax address availability check for multicast binds.

While it makes sense to limit bind(2) of unicast addresses that overlap
each other to be all from the same UID (like 0.0.0.0:53 and 127.0.0.1:53)
it makes little sense for multicast. Multicast is delivered to all sockets
that match so there is no risk of someone stealing traffic from someone
else. This should hopefully help with mDNS as reported by robert@
OK deraadt@ bluhm@


Revision tags: OPENBSD_7_1_BASE
# 1.116 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.115 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.114 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.113 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.112 11-Feb-2021 patrick

Swap faddr/laddr and fport/lport arguments in call to stoeplitz_ipXport().
Technically the whole point of the stoeplitz API is that it's symmetric,
meaning that the order of addresses and ports doesn't matter and will produce
the same hash value.

Coverity CID 1501717
ok dlg@


# 1.111 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.110 29-Nov-2019 nayden

add __func__ to panic() and printf() calls in sys/netinet6/*
ok benno@ mortimer@


# 1.109 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.108 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.135 07-Feb-2024 bluhm

Use the route generation number also for IPv6.

Implement route6_cache() to check whether the cached route is still
valid and otherwise fill caching parameter of struct route_in6.
Also count cache hits and misses in netstat. in_pcbrtentry() uses
route cache now.

OK claudio@


# 1.134 31-Jan-2024 bluhm

Split in_pcbrtentry() and in6_pcbrtentry() based on INP_IPV6.

Splitting the IPv6 code into a separate function results in less
#ifdef INET6. Also struct route_in6 *ro in in6_pcbrtentry() is of
the correct type and in_pcbrtentry() does not rely on the fact that
inp_route and inp_route6 are pointers to the same union.

OK kn@ claudio@


# 1.133 28-Jan-2024 bluhm

Use more specific sockaddr type for inpcb notify.

in_pcbnotifyall() is an IPv4 only function. All callers check that
sockaddr dst is in fact a sockaddr_in. Pass the more spcific type
and remove the runtime check at beginning of in_pcbnotifyall().
Use const sockaddr_in in in_pcbnotifyall() and const sockaddr_in6
in6_pcbnotify() as dst parameter.

OK millert@


# 1.132 09-Jan-2024 bluhm

Convert some struct inpcb parameter to const pointer.

OK millert@


# 1.131 07-Dec-2023 bluhm

Inpcb table mutex protects addr and port during bind(2) and connect(2).

in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() have to set
addresses and ports within the same critical section as the inpcb
hash table calculation. Also lookup and address selection have to
be protected to avoid bindings and connections that are not unique.

For that in_pcbpickport() and in_pcbbind_locked() expect that the
table mutex is already taken. The functions in_pcblookup_lock(),
in_pcblookup_local_lock(), and in_pcbaddrisavail_lock() grab the
mutex iff the lock parameter is IN_PCBLOCK_GRAB. Otherwise the
parameter is IN_PCBLOCK_HOLD has the lock has to be taken already.
Note that in_pcblookup_lock() and in_pcblookup_local() return an
inp with increased reference iff they take and release the lock.
Otherwise the caller protects the life time of the inp.

This gives enough flexibility that in_pcbbind() and in_pcbconnect()
can hold the table mutex when they need it. The public inpcb API
does not change.

OK sashan@ mvs@


# 1.130 03-Dec-2023 bluhm

Rename all in6p local variables to inp.

There exists no struct in6pcb in OpenBSD, this was an old kame idea.
Calling the local variable in6p does not make sense, it is actually
a struct inpcb. Also in6p is not used consistently in inet6 code.
Having the same convention for IPv4 and IPv6 is less confusing.

OK sashan@ mvs@


# 1.129 03-Dec-2023 bluhm

Use INP_IPV6 flag instead of sotopf().

During initialization in_pcballoc() sets INP_IPV6 once to avoid
reaching through inp_socket->so_proto->pr_domain->dom_family. Use
this flag consistently.

OK sashan@ mvs@


# 1.128 01-Dec-2023 bluhm

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@


# 1.127 01-Dec-2023 bluhm

Make internet PCB connect more consistent.

The public interface is in_pcbconnect(). It dispatches to
in6_pcbconnect() if necessary. Call the former from tcp_connect()
and udp_connect().
In in6_pcbconnect() initialization in6a = NULL is not necessary.
in6_pcbselsrc() sets the pointer, but does not read the value.
Pass a constant in6_addr pointer to in6_pcbselsrc() and in6_selectsrc().
It returns a reference to the address of some internal data structure.
We want to be sure that in6_addr is not modified this way. IPv4
in_pcbselsrc() solves this by passing a copy of the address.

OK kn@ sashan@ mvs@


# 1.126 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.125 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


Revision tags: OPENBSD_7_4_BASE
# 1.124 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.123 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.122 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.121 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.120 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.119 08-Aug-2022 bluhm

To make protocol input functions MP safe, internet PCB need protection.
Use their reference counter in more places.
The in_pcb lookup functions hold the PCBs in hash tables protected
by table->inpt_mtx mutex. Whenever a result is returned, increment
the ref count before releasing the mutex. Then the inp can be used
as long as neccessary. Unref it at the end of all functions that
call in_pcb lookup.
As a shortcut, pf may also hold a reference to the PCB. When
pf_inp_lookup() returns it, it also incements the ref count and the
caller can handle it like the inp from table lookup.
OK sashan@


# 1.118 06-Aug-2022 bluhm

Clean up the netlock macros. Merge NET_RLOCK_IN_SOFTNET and
NET_RLOCK_IN_IOCTL, which have the same implementation. The R and
W are hard to see, call the new macro NET_LOCK_SHARED. Rename the
opposite assertion from NET_ASSERT_WLOCKED to NET_ASSERT_LOCKED_EXCLUSIVE.
Update some outdated comments about net locking.
OK mpi@ mvs@


# 1.117 14-Apr-2022 claudio

Relax address availability check for multicast binds.

While it makes sense to limit bind(2) of unicast addresses that overlap
each other to be all from the same UID (like 0.0.0.0:53 and 127.0.0.1:53)
it makes little sense for multicast. Multicast is delivered to all sockets
that match so there is no risk of someone stealing traffic from someone
else. This should hopefully help with mDNS as reported by robert@
OK deraadt@ bluhm@


Revision tags: OPENBSD_7_1_BASE
# 1.116 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.115 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.114 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.113 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.112 11-Feb-2021 patrick

Swap faddr/laddr and fport/lport arguments in call to stoeplitz_ipXport().
Technically the whole point of the stoeplitz API is that it's symmetric,
meaning that the order of addresses and ports doesn't matter and will produce
the same hash value.

Coverity CID 1501717
ok dlg@


# 1.111 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.110 29-Nov-2019 nayden

add __func__ to panic() and printf() calls in sys/netinet6/*
ok benno@ mortimer@


# 1.109 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.108 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.134 31-Jan-2024 bluhm

Split in_pcbrtentry() and in6_pcbrtentry() based on INP_IPV6.

Splitting the IPv6 code into a separate function results in less
#ifdef INET6. Also struct route_in6 *ro in in6_pcbrtentry() is of
the correct type and in_pcbrtentry() does not rely on the fact that
inp_route and inp_route6 are pointers to the same union.

OK kn@ claudio@


# 1.133 28-Jan-2024 bluhm

Use more specific sockaddr type for inpcb notify.

in_pcbnotifyall() is an IPv4 only function. All callers check that
sockaddr dst is in fact a sockaddr_in. Pass the more spcific type
and remove the runtime check at beginning of in_pcbnotifyall().
Use const sockaddr_in in in_pcbnotifyall() and const sockaddr_in6
in6_pcbnotify() as dst parameter.

OK millert@


# 1.132 09-Jan-2024 bluhm

Convert some struct inpcb parameter to const pointer.

OK millert@


# 1.131 07-Dec-2023 bluhm

Inpcb table mutex protects addr and port during bind(2) and connect(2).

in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() have to set
addresses and ports within the same critical section as the inpcb
hash table calculation. Also lookup and address selection have to
be protected to avoid bindings and connections that are not unique.

For that in_pcbpickport() and in_pcbbind_locked() expect that the
table mutex is already taken. The functions in_pcblookup_lock(),
in_pcblookup_local_lock(), and in_pcbaddrisavail_lock() grab the
mutex iff the lock parameter is IN_PCBLOCK_GRAB. Otherwise the
parameter is IN_PCBLOCK_HOLD has the lock has to be taken already.
Note that in_pcblookup_lock() and in_pcblookup_local() return an
inp with increased reference iff they take and release the lock.
Otherwise the caller protects the life time of the inp.

This gives enough flexibility that in_pcbbind() and in_pcbconnect()
can hold the table mutex when they need it. The public inpcb API
does not change.

OK sashan@ mvs@


# 1.130 03-Dec-2023 bluhm

Rename all in6p local variables to inp.

There exists no struct in6pcb in OpenBSD, this was an old kame idea.
Calling the local variable in6p does not make sense, it is actually
a struct inpcb. Also in6p is not used consistently in inet6 code.
Having the same convention for IPv4 and IPv6 is less confusing.

OK sashan@ mvs@


# 1.129 03-Dec-2023 bluhm

Use INP_IPV6 flag instead of sotopf().

During initialization in_pcballoc() sets INP_IPV6 once to avoid
reaching through inp_socket->so_proto->pr_domain->dom_family. Use
this flag consistently.

OK sashan@ mvs@


# 1.128 01-Dec-2023 bluhm

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@


# 1.127 01-Dec-2023 bluhm

Make internet PCB connect more consistent.

The public interface is in_pcbconnect(). It dispatches to
in6_pcbconnect() if necessary. Call the former from tcp_connect()
and udp_connect().
In in6_pcbconnect() initialization in6a = NULL is not necessary.
in6_pcbselsrc() sets the pointer, but does not read the value.
Pass a constant in6_addr pointer to in6_pcbselsrc() and in6_selectsrc().
It returns a reference to the address of some internal data structure.
We want to be sure that in6_addr is not modified this way. IPv4
in_pcbselsrc() solves this by passing a copy of the address.

OK kn@ sashan@ mvs@


# 1.126 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.125 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


Revision tags: OPENBSD_7_4_BASE
# 1.124 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.123 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.122 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.121 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.120 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.119 08-Aug-2022 bluhm

To make protocol input functions MP safe, internet PCB need protection.
Use their reference counter in more places.
The in_pcb lookup functions hold the PCBs in hash tables protected
by table->inpt_mtx mutex. Whenever a result is returned, increment
the ref count before releasing the mutex. Then the inp can be used
as long as neccessary. Unref it at the end of all functions that
call in_pcb lookup.
As a shortcut, pf may also hold a reference to the PCB. When
pf_inp_lookup() returns it, it also incements the ref count and the
caller can handle it like the inp from table lookup.
OK sashan@


# 1.118 06-Aug-2022 bluhm

Clean up the netlock macros. Merge NET_RLOCK_IN_SOFTNET and
NET_RLOCK_IN_IOCTL, which have the same implementation. The R and
W are hard to see, call the new macro NET_LOCK_SHARED. Rename the
opposite assertion from NET_ASSERT_WLOCKED to NET_ASSERT_LOCKED_EXCLUSIVE.
Update some outdated comments about net locking.
OK mpi@ mvs@


# 1.117 14-Apr-2022 claudio

Relax address availability check for multicast binds.

While it makes sense to limit bind(2) of unicast addresses that overlap
each other to be all from the same UID (like 0.0.0.0:53 and 127.0.0.1:53)
it makes little sense for multicast. Multicast is delivered to all sockets
that match so there is no risk of someone stealing traffic from someone
else. This should hopefully help with mDNS as reported by robert@
OK deraadt@ bluhm@


Revision tags: OPENBSD_7_1_BASE
# 1.116 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.115 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.114 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.113 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.112 11-Feb-2021 patrick

Swap faddr/laddr and fport/lport arguments in call to stoeplitz_ipXport().
Technically the whole point of the stoeplitz API is that it's symmetric,
meaning that the order of addresses and ports doesn't matter and will produce
the same hash value.

Coverity CID 1501717
ok dlg@


# 1.111 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.110 29-Nov-2019 nayden

add __func__ to panic() and printf() calls in sys/netinet6/*
ok benno@ mortimer@


# 1.109 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.108 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.133 28-Jan-2024 bluhm

Use more specific sockaddr type for inpcb notify.

in_pcbnotifyall() is an IPv4 only function. All callers check that
sockaddr dst is in fact a sockaddr_in. Pass the more spcific type
and remove the runtime check at beginning of in_pcbnotifyall().
Use const sockaddr_in in in_pcbnotifyall() and const sockaddr_in6
in6_pcbnotify() as dst parameter.

OK millert@


# 1.132 09-Jan-2024 bluhm

Convert some struct inpcb parameter to const pointer.

OK millert@


# 1.131 07-Dec-2023 bluhm

Inpcb table mutex protects addr and port during bind(2) and connect(2).

in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() have to set
addresses and ports within the same critical section as the inpcb
hash table calculation. Also lookup and address selection have to
be protected to avoid bindings and connections that are not unique.

For that in_pcbpickport() and in_pcbbind_locked() expect that the
table mutex is already taken. The functions in_pcblookup_lock(),
in_pcblookup_local_lock(), and in_pcbaddrisavail_lock() grab the
mutex iff the lock parameter is IN_PCBLOCK_GRAB. Otherwise the
parameter is IN_PCBLOCK_HOLD has the lock has to be taken already.
Note that in_pcblookup_lock() and in_pcblookup_local() return an
inp with increased reference iff they take and release the lock.
Otherwise the caller protects the life time of the inp.

This gives enough flexibility that in_pcbbind() and in_pcbconnect()
can hold the table mutex when they need it. The public inpcb API
does not change.

OK sashan@ mvs@


# 1.130 03-Dec-2023 bluhm

Rename all in6p local variables to inp.

There exists no struct in6pcb in OpenBSD, this was an old kame idea.
Calling the local variable in6p does not make sense, it is actually
a struct inpcb. Also in6p is not used consistently in inet6 code.
Having the same convention for IPv4 and IPv6 is less confusing.

OK sashan@ mvs@


# 1.129 03-Dec-2023 bluhm

Use INP_IPV6 flag instead of sotopf().

During initialization in_pcballoc() sets INP_IPV6 once to avoid
reaching through inp_socket->so_proto->pr_domain->dom_family. Use
this flag consistently.

OK sashan@ mvs@


# 1.128 01-Dec-2023 bluhm

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@


# 1.127 01-Dec-2023 bluhm

Make internet PCB connect more consistent.

The public interface is in_pcbconnect(). It dispatches to
in6_pcbconnect() if necessary. Call the former from tcp_connect()
and udp_connect().
In in6_pcbconnect() initialization in6a = NULL is not necessary.
in6_pcbselsrc() sets the pointer, but does not read the value.
Pass a constant in6_addr pointer to in6_pcbselsrc() and in6_selectsrc().
It returns a reference to the address of some internal data structure.
We want to be sure that in6_addr is not modified this way. IPv4
in_pcbselsrc() solves this by passing a copy of the address.

OK kn@ sashan@ mvs@


# 1.126 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.125 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


Revision tags: OPENBSD_7_4_BASE
# 1.124 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.123 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.122 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.121 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.120 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.119 08-Aug-2022 bluhm

To make protocol input functions MP safe, internet PCB need protection.
Use their reference counter in more places.
The in_pcb lookup functions hold the PCBs in hash tables protected
by table->inpt_mtx mutex. Whenever a result is returned, increment
the ref count before releasing the mutex. Then the inp can be used
as long as neccessary. Unref it at the end of all functions that
call in_pcb lookup.
As a shortcut, pf may also hold a reference to the PCB. When
pf_inp_lookup() returns it, it also incements the ref count and the
caller can handle it like the inp from table lookup.
OK sashan@


# 1.118 06-Aug-2022 bluhm

Clean up the netlock macros. Merge NET_RLOCK_IN_SOFTNET and
NET_RLOCK_IN_IOCTL, which have the same implementation. The R and
W are hard to see, call the new macro NET_LOCK_SHARED. Rename the
opposite assertion from NET_ASSERT_WLOCKED to NET_ASSERT_LOCKED_EXCLUSIVE.
Update some outdated comments about net locking.
OK mpi@ mvs@


# 1.117 14-Apr-2022 claudio

Relax address availability check for multicast binds.

While it makes sense to limit bind(2) of unicast addresses that overlap
each other to be all from the same UID (like 0.0.0.0:53 and 127.0.0.1:53)
it makes little sense for multicast. Multicast is delivered to all sockets
that match so there is no risk of someone stealing traffic from someone
else. This should hopefully help with mDNS as reported by robert@
OK deraadt@ bluhm@


Revision tags: OPENBSD_7_1_BASE
# 1.116 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.115 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.114 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.113 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.112 11-Feb-2021 patrick

Swap faddr/laddr and fport/lport arguments in call to stoeplitz_ipXport().
Technically the whole point of the stoeplitz API is that it's symmetric,
meaning that the order of addresses and ports doesn't matter and will produce
the same hash value.

Coverity CID 1501717
ok dlg@


# 1.111 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.110 29-Nov-2019 nayden

add __func__ to panic() and printf() calls in sys/netinet6/*
ok benno@ mortimer@


# 1.109 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.108 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.132 09-Jan-2024 bluhm

Convert some struct inpcb parameter to const pointer.

OK millert@


# 1.131 07-Dec-2023 bluhm

Inpcb table mutex protects addr and port during bind(2) and connect(2).

in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() have to set
addresses and ports within the same critical section as the inpcb
hash table calculation. Also lookup and address selection have to
be protected to avoid bindings and connections that are not unique.

For that in_pcbpickport() and in_pcbbind_locked() expect that the
table mutex is already taken. The functions in_pcblookup_lock(),
in_pcblookup_local_lock(), and in_pcbaddrisavail_lock() grab the
mutex iff the lock parameter is IN_PCBLOCK_GRAB. Otherwise the
parameter is IN_PCBLOCK_HOLD has the lock has to be taken already.
Note that in_pcblookup_lock() and in_pcblookup_local() return an
inp with increased reference iff they take and release the lock.
Otherwise the caller protects the life time of the inp.

This gives enough flexibility that in_pcbbind() and in_pcbconnect()
can hold the table mutex when they need it. The public inpcb API
does not change.

OK sashan@ mvs@


# 1.130 03-Dec-2023 bluhm

Rename all in6p local variables to inp.

There exists no struct in6pcb in OpenBSD, this was an old kame idea.
Calling the local variable in6p does not make sense, it is actually
a struct inpcb. Also in6p is not used consistently in inet6 code.
Having the same convention for IPv4 and IPv6 is less confusing.

OK sashan@ mvs@


# 1.129 03-Dec-2023 bluhm

Use INP_IPV6 flag instead of sotopf().

During initialization in_pcballoc() sets INP_IPV6 once to avoid
reaching through inp_socket->so_proto->pr_domain->dom_family. Use
this flag consistently.

OK sashan@ mvs@


# 1.128 01-Dec-2023 bluhm

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@


# 1.127 01-Dec-2023 bluhm

Make internet PCB connect more consistent.

The public interface is in_pcbconnect(). It dispatches to
in6_pcbconnect() if necessary. Call the former from tcp_connect()
and udp_connect().
In in6_pcbconnect() initialization in6a = NULL is not necessary.
in6_pcbselsrc() sets the pointer, but does not read the value.
Pass a constant in6_addr pointer to in6_pcbselsrc() and in6_selectsrc().
It returns a reference to the address of some internal data structure.
We want to be sure that in6_addr is not modified this way. IPv4
in_pcbselsrc() solves this by passing a copy of the address.

OK kn@ sashan@ mvs@


# 1.126 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.125 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


Revision tags: OPENBSD_7_4_BASE
# 1.124 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.123 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.122 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.121 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.120 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.119 08-Aug-2022 bluhm

To make protocol input functions MP safe, internet PCB need protection.
Use their reference counter in more places.
The in_pcb lookup functions hold the PCBs in hash tables protected
by table->inpt_mtx mutex. Whenever a result is returned, increment
the ref count before releasing the mutex. Then the inp can be used
as long as neccessary. Unref it at the end of all functions that
call in_pcb lookup.
As a shortcut, pf may also hold a reference to the PCB. When
pf_inp_lookup() returns it, it also incements the ref count and the
caller can handle it like the inp from table lookup.
OK sashan@


# 1.118 06-Aug-2022 bluhm

Clean up the netlock macros. Merge NET_RLOCK_IN_SOFTNET and
NET_RLOCK_IN_IOCTL, which have the same implementation. The R and
W are hard to see, call the new macro NET_LOCK_SHARED. Rename the
opposite assertion from NET_ASSERT_WLOCKED to NET_ASSERT_LOCKED_EXCLUSIVE.
Update some outdated comments about net locking.
OK mpi@ mvs@


# 1.117 14-Apr-2022 claudio

Relax address availability check for multicast binds.

While it makes sense to limit bind(2) of unicast addresses that overlap
each other to be all from the same UID (like 0.0.0.0:53 and 127.0.0.1:53)
it makes little sense for multicast. Multicast is delivered to all sockets
that match so there is no risk of someone stealing traffic from someone
else. This should hopefully help with mDNS as reported by robert@
OK deraadt@ bluhm@


Revision tags: OPENBSD_7_1_BASE
# 1.116 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.115 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.114 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.113 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.112 11-Feb-2021 patrick

Swap faddr/laddr and fport/lport arguments in call to stoeplitz_ipXport().
Technically the whole point of the stoeplitz API is that it's symmetric,
meaning that the order of addresses and ports doesn't matter and will produce
the same hash value.

Coverity CID 1501717
ok dlg@


# 1.111 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.110 29-Nov-2019 nayden

add __func__ to panic() and printf() calls in sys/netinet6/*
ok benno@ mortimer@


# 1.109 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.108 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.131 07-Dec-2023 bluhm

Inpcb table mutex protects addr and port during bind(2) and connect(2).

in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() have to set
addresses and ports within the same critical section as the inpcb
hash table calculation. Also lookup and address selection have to
be protected to avoid bindings and connections that are not unique.

For that in_pcbpickport() and in_pcbbind_locked() expect that the
table mutex is already taken. The functions in_pcblookup_lock(),
in_pcblookup_local_lock(), and in_pcbaddrisavail_lock() grab the
mutex iff the lock parameter is IN_PCBLOCK_GRAB. Otherwise the
parameter is IN_PCBLOCK_HOLD has the lock has to be taken already.
Note that in_pcblookup_lock() and in_pcblookup_local() return an
inp with increased reference iff they take and release the lock.
Otherwise the caller protects the life time of the inp.

This gives enough flexibility that in_pcbbind() and in_pcbconnect()
can hold the table mutex when they need it. The public inpcb API
does not change.

OK sashan@ mvs@


# 1.130 03-Dec-2023 bluhm

Rename all in6p local variables to inp.

There exists no struct in6pcb in OpenBSD, this was an old kame idea.
Calling the local variable in6p does not make sense, it is actually
a struct inpcb. Also in6p is not used consistently in inet6 code.
Having the same convention for IPv4 and IPv6 is less confusing.

OK sashan@ mvs@


# 1.129 03-Dec-2023 bluhm

Use INP_IPV6 flag instead of sotopf().

During initialization in_pcballoc() sets INP_IPV6 once to avoid
reaching through inp_socket->so_proto->pr_domain->dom_family. Use
this flag consistently.

OK sashan@ mvs@


# 1.128 01-Dec-2023 bluhm

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@


# 1.127 01-Dec-2023 bluhm

Make internet PCB connect more consistent.

The public interface is in_pcbconnect(). It dispatches to
in6_pcbconnect() if necessary. Call the former from tcp_connect()
and udp_connect().
In in6_pcbconnect() initialization in6a = NULL is not necessary.
in6_pcbselsrc() sets the pointer, but does not read the value.
Pass a constant in6_addr pointer to in6_pcbselsrc() and in6_selectsrc().
It returns a reference to the address of some internal data structure.
We want to be sure that in6_addr is not modified this way. IPv4
in_pcbselsrc() solves this by passing a copy of the address.

OK kn@ sashan@ mvs@


# 1.126 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.125 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


Revision tags: OPENBSD_7_4_BASE
# 1.124 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.123 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.122 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.121 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.120 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.119 08-Aug-2022 bluhm

To make protocol input functions MP safe, internet PCB need protection.
Use their reference counter in more places.
The in_pcb lookup functions hold the PCBs in hash tables protected
by table->inpt_mtx mutex. Whenever a result is returned, increment
the ref count before releasing the mutex. Then the inp can be used
as long as neccessary. Unref it at the end of all functions that
call in_pcb lookup.
As a shortcut, pf may also hold a reference to the PCB. When
pf_inp_lookup() returns it, it also incements the ref count and the
caller can handle it like the inp from table lookup.
OK sashan@


# 1.118 06-Aug-2022 bluhm

Clean up the netlock macros. Merge NET_RLOCK_IN_SOFTNET and
NET_RLOCK_IN_IOCTL, which have the same implementation. The R and
W are hard to see, call the new macro NET_LOCK_SHARED. Rename the
opposite assertion from NET_ASSERT_WLOCKED to NET_ASSERT_LOCKED_EXCLUSIVE.
Update some outdated comments about net locking.
OK mpi@ mvs@


# 1.117 14-Apr-2022 claudio

Relax address availability check for multicast binds.

While it makes sense to limit bind(2) of unicast addresses that overlap
each other to be all from the same UID (like 0.0.0.0:53 and 127.0.0.1:53)
it makes little sense for multicast. Multicast is delivered to all sockets
that match so there is no risk of someone stealing traffic from someone
else. This should hopefully help with mDNS as reported by robert@
OK deraadt@ bluhm@


Revision tags: OPENBSD_7_1_BASE
# 1.116 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.115 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.114 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.113 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.112 11-Feb-2021 patrick

Swap faddr/laddr and fport/lport arguments in call to stoeplitz_ipXport().
Technically the whole point of the stoeplitz API is that it's symmetric,
meaning that the order of addresses and ports doesn't matter and will produce
the same hash value.

Coverity CID 1501717
ok dlg@


# 1.111 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.110 29-Nov-2019 nayden

add __func__ to panic() and printf() calls in sys/netinet6/*
ok benno@ mortimer@


# 1.109 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.108 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.130 03-Dec-2023 bluhm

Rename all in6p local variables to inp.

There exists no struct in6pcb in OpenBSD, this was an old kame idea.
Calling the local variable in6p does not make sense, it is actually
a struct inpcb. Also in6p is not used consistently in inet6 code.
Having the same convention for IPv4 and IPv6 is less confusing.

OK sashan@ mvs@


# 1.129 03-Dec-2023 bluhm

Use INP_IPV6 flag instead of sotopf().

During initialization in_pcballoc() sets INP_IPV6 once to avoid
reaching through inp_socket->so_proto->pr_domain->dom_family. Use
this flag consistently.

OK sashan@ mvs@


# 1.128 01-Dec-2023 bluhm

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@


# 1.127 01-Dec-2023 bluhm

Make internet PCB connect more consistent.

The public interface is in_pcbconnect(). It dispatches to
in6_pcbconnect() if necessary. Call the former from tcp_connect()
and udp_connect().
In in6_pcbconnect() initialization in6a = NULL is not necessary.
in6_pcbselsrc() sets the pointer, but does not read the value.
Pass a constant in6_addr pointer to in6_pcbselsrc() and in6_selectsrc().
It returns a reference to the address of some internal data structure.
We want to be sure that in6_addr is not modified this way. IPv4
in_pcbselsrc() solves this by passing a copy of the address.

OK kn@ sashan@ mvs@


# 1.126 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.125 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


Revision tags: OPENBSD_7_4_BASE
# 1.124 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.123 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.122 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.121 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.120 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.119 08-Aug-2022 bluhm

To make protocol input functions MP safe, internet PCB need protection.
Use their reference counter in more places.
The in_pcb lookup functions hold the PCBs in hash tables protected
by table->inpt_mtx mutex. Whenever a result is returned, increment
the ref count before releasing the mutex. Then the inp can be used
as long as neccessary. Unref it at the end of all functions that
call in_pcb lookup.
As a shortcut, pf may also hold a reference to the PCB. When
pf_inp_lookup() returns it, it also incements the ref count and the
caller can handle it like the inp from table lookup.
OK sashan@


# 1.118 06-Aug-2022 bluhm

Clean up the netlock macros. Merge NET_RLOCK_IN_SOFTNET and
NET_RLOCK_IN_IOCTL, which have the same implementation. The R and
W are hard to see, call the new macro NET_LOCK_SHARED. Rename the
opposite assertion from NET_ASSERT_WLOCKED to NET_ASSERT_LOCKED_EXCLUSIVE.
Update some outdated comments about net locking.
OK mpi@ mvs@


# 1.117 14-Apr-2022 claudio

Relax address availability check for multicast binds.

While it makes sense to limit bind(2) of unicast addresses that overlap
each other to be all from the same UID (like 0.0.0.0:53 and 127.0.0.1:53)
it makes little sense for multicast. Multicast is delivered to all sockets
that match so there is no risk of someone stealing traffic from someone
else. This should hopefully help with mDNS as reported by robert@
OK deraadt@ bluhm@


Revision tags: OPENBSD_7_1_BASE
# 1.116 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.115 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.114 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.113 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.112 11-Feb-2021 patrick

Swap faddr/laddr and fport/lport arguments in call to stoeplitz_ipXport().
Technically the whole point of the stoeplitz API is that it's symmetric,
meaning that the order of addresses and ports doesn't matter and will produce
the same hash value.

Coverity CID 1501717
ok dlg@


# 1.111 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.110 29-Nov-2019 nayden

add __func__ to panic() and printf() calls in sys/netinet6/*
ok benno@ mortimer@


# 1.109 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.108 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.130 03-Dec-2023 bluhm

Rename all in6p local variables to inp.

There exists no struct in6pcb in OpenBSD, this was an old kame idea.
Calling the local variable in6p does not make sense, it is actually
a struct inpcb. Also in6p is not used consistently in inet6 code.
Having the same convention for IPv4 and IPv6 is less confusing.

OK sashan@ mvs@


# 1.129 03-Dec-2023 bluhm

Use INP_IPV6 flag instead of sotopf().

During initialization in_pcballoc() sets INP_IPV6 once to avoid
reaching through inp_socket->so_proto->pr_domain->dom_family. Use
this flag consistently.

OK sashan@ mvs@


# 1.128 01-Dec-2023 bluhm

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@


# 1.127 01-Dec-2023 bluhm

Make internet PCB connect more consistent.

The public interface is in_pcbconnect(). It dispatches to
in6_pcbconnect() if necessary. Call the former from tcp_connect()
and udp_connect().
In in6_pcbconnect() initialization in6a = NULL is not necessary.
in6_pcbselsrc() sets the pointer, but does not read the value.
Pass a constant in6_addr pointer to in6_pcbselsrc() and in6_selectsrc().
It returns a reference to the address of some internal data structure.
We want to be sure that in6_addr is not modified this way. IPv4
in_pcbselsrc() solves this by passing a copy of the address.

OK kn@ sashan@ mvs@


# 1.126 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.125 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


Revision tags: OPENBSD_7_4_BASE
# 1.124 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.123 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.122 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.121 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.120 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.119 08-Aug-2022 bluhm

To make protocol input functions MP safe, internet PCB need protection.
Use their reference counter in more places.
The in_pcb lookup functions hold the PCBs in hash tables protected
by table->inpt_mtx mutex. Whenever a result is returned, increment
the ref count before releasing the mutex. Then the inp can be used
as long as neccessary. Unref it at the end of all functions that
call in_pcb lookup.
As a shortcut, pf may also hold a reference to the PCB. When
pf_inp_lookup() returns it, it also incements the ref count and the
caller can handle it like the inp from table lookup.
OK sashan@


# 1.118 06-Aug-2022 bluhm

Clean up the netlock macros. Merge NET_RLOCK_IN_SOFTNET and
NET_RLOCK_IN_IOCTL, which have the same implementation. The R and
W are hard to see, call the new macro NET_LOCK_SHARED. Rename the
opposite assertion from NET_ASSERT_WLOCKED to NET_ASSERT_LOCKED_EXCLUSIVE.
Update some outdated comments about net locking.
OK mpi@ mvs@


# 1.117 14-Apr-2022 claudio

Relax address availability check for multicast binds.

While it makes sense to limit bind(2) of unicast addresses that overlap
each other to be all from the same UID (like 0.0.0.0:53 and 127.0.0.1:53)
it makes little sense for multicast. Multicast is delivered to all sockets
that match so there is no risk of someone stealing traffic from someone
else. This should hopefully help with mDNS as reported by robert@
OK deraadt@ bluhm@


Revision tags: OPENBSD_7_1_BASE
# 1.116 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.115 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.114 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.113 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.112 11-Feb-2021 patrick

Swap faddr/laddr and fport/lport arguments in call to stoeplitz_ipXport().
Technically the whole point of the stoeplitz API is that it's symmetric,
meaning that the order of addresses and ports doesn't matter and will produce
the same hash value.

Coverity CID 1501717
ok dlg@


# 1.111 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.110 29-Nov-2019 nayden

add __func__ to panic() and printf() calls in sys/netinet6/*
ok benno@ mortimer@


# 1.109 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.108 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.128 01-Dec-2023 bluhm

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@


# 1.127 01-Dec-2023 bluhm

Make internet PCB connect more consistent.

The public interface is in_pcbconnect(). It dispatches to
in6_pcbconnect() if necessary. Call the former from tcp_connect()
and udp_connect().
In in6_pcbconnect() initialization in6a = NULL is not necessary.
in6_pcbselsrc() sets the pointer, but does not read the value.
Pass a constant in6_addr pointer to in6_pcbselsrc() and in6_selectsrc().
It returns a reference to the address of some internal data structure.
We want to be sure that in6_addr is not modified this way. IPv4
in_pcbselsrc() solves this by passing a copy of the address.

OK kn@ sashan@ mvs@


# 1.126 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.125 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


Revision tags: OPENBSD_7_4_BASE
# 1.124 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.123 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.122 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.121 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.120 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.119 08-Aug-2022 bluhm

To make protocol input functions MP safe, internet PCB need protection.
Use their reference counter in more places.
The in_pcb lookup functions hold the PCBs in hash tables protected
by table->inpt_mtx mutex. Whenever a result is returned, increment
the ref count before releasing the mutex. Then the inp can be used
as long as neccessary. Unref it at the end of all functions that
call in_pcb lookup.
As a shortcut, pf may also hold a reference to the PCB. When
pf_inp_lookup() returns it, it also incements the ref count and the
caller can handle it like the inp from table lookup.
OK sashan@


# 1.118 06-Aug-2022 bluhm

Clean up the netlock macros. Merge NET_RLOCK_IN_SOFTNET and
NET_RLOCK_IN_IOCTL, which have the same implementation. The R and
W are hard to see, call the new macro NET_LOCK_SHARED. Rename the
opposite assertion from NET_ASSERT_WLOCKED to NET_ASSERT_LOCKED_EXCLUSIVE.
Update some outdated comments about net locking.
OK mpi@ mvs@


# 1.117 14-Apr-2022 claudio

Relax address availability check for multicast binds.

While it makes sense to limit bind(2) of unicast addresses that overlap
each other to be all from the same UID (like 0.0.0.0:53 and 127.0.0.1:53)
it makes little sense for multicast. Multicast is delivered to all sockets
that match so there is no risk of someone stealing traffic from someone
else. This should hopefully help with mDNS as reported by robert@
OK deraadt@ bluhm@


Revision tags: OPENBSD_7_1_BASE
# 1.116 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.115 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.114 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.113 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.112 11-Feb-2021 patrick

Swap faddr/laddr and fport/lport arguments in call to stoeplitz_ipXport().
Technically the whole point of the stoeplitz API is that it's symmetric,
meaning that the order of addresses and ports doesn't matter and will produce
the same hash value.

Coverity CID 1501717
ok dlg@


# 1.111 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.110 29-Nov-2019 nayden

add __func__ to panic() and printf() calls in sys/netinet6/*
ok benno@ mortimer@


# 1.109 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.108 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.126 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.125 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


Revision tags: OPENBSD_7_4_BASE
# 1.124 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.123 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.122 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.121 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.120 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.119 08-Aug-2022 bluhm

To make protocol input functions MP safe, internet PCB need protection.
Use their reference counter in more places.
The in_pcb lookup functions hold the PCBs in hash tables protected
by table->inpt_mtx mutex. Whenever a result is returned, increment
the ref count before releasing the mutex. Then the inp can be used
as long as neccessary. Unref it at the end of all functions that
call in_pcb lookup.
As a shortcut, pf may also hold a reference to the PCB. When
pf_inp_lookup() returns it, it also incements the ref count and the
caller can handle it like the inp from table lookup.
OK sashan@


# 1.118 06-Aug-2022 bluhm

Clean up the netlock macros. Merge NET_RLOCK_IN_SOFTNET and
NET_RLOCK_IN_IOCTL, which have the same implementation. The R and
W are hard to see, call the new macro NET_LOCK_SHARED. Rename the
opposite assertion from NET_ASSERT_WLOCKED to NET_ASSERT_LOCKED_EXCLUSIVE.
Update some outdated comments about net locking.
OK mpi@ mvs@


# 1.117 14-Apr-2022 claudio

Relax address availability check for multicast binds.

While it makes sense to limit bind(2) of unicast addresses that overlap
each other to be all from the same UID (like 0.0.0.0:53 and 127.0.0.1:53)
it makes little sense for multicast. Multicast is delivered to all sockets
that match so there is no risk of someone stealing traffic from someone
else. This should hopefully help with mDNS as reported by robert@
OK deraadt@ bluhm@


Revision tags: OPENBSD_7_1_BASE
# 1.116 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.115 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.114 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.113 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.112 11-Feb-2021 patrick

Swap faddr/laddr and fport/lport arguments in call to stoeplitz_ipXport().
Technically the whole point of the stoeplitz API is that it's symmetric,
meaning that the order of addresses and ports doesn't matter and will produce
the same hash value.

Coverity CID 1501717
ok dlg@


# 1.111 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.110 29-Nov-2019 nayden

add __func__ to panic() and printf() calls in sys/netinet6/*
ok benno@ mortimer@


# 1.109 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.108 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.124 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.123 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.122 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.121 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.120 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.119 08-Aug-2022 bluhm

To make protocol input functions MP safe, internet PCB need protection.
Use their reference counter in more places.
The in_pcb lookup functions hold the PCBs in hash tables protected
by table->inpt_mtx mutex. Whenever a result is returned, increment
the ref count before releasing the mutex. Then the inp can be used
as long as neccessary. Unref it at the end of all functions that
call in_pcb lookup.
As a shortcut, pf may also hold a reference to the PCB. When
pf_inp_lookup() returns it, it also incements the ref count and the
caller can handle it like the inp from table lookup.
OK sashan@


# 1.118 06-Aug-2022 bluhm

Clean up the netlock macros. Merge NET_RLOCK_IN_SOFTNET and
NET_RLOCK_IN_IOCTL, which have the same implementation. The R and
W are hard to see, call the new macro NET_LOCK_SHARED. Rename the
opposite assertion from NET_ASSERT_WLOCKED to NET_ASSERT_LOCKED_EXCLUSIVE.
Update some outdated comments about net locking.
OK mpi@ mvs@


# 1.117 14-Apr-2022 claudio

Relax address availability check for multicast binds.

While it makes sense to limit bind(2) of unicast addresses that overlap
each other to be all from the same UID (like 0.0.0.0:53 and 127.0.0.1:53)
it makes little sense for multicast. Multicast is delivered to all sockets
that match so there is no risk of someone stealing traffic from someone
else. This should hopefully help with mDNS as reported by robert@
OK deraadt@ bluhm@


Revision tags: OPENBSD_7_1_BASE
# 1.116 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.115 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.114 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.113 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.112 11-Feb-2021 patrick

Swap faddr/laddr and fport/lport arguments in call to stoeplitz_ipXport().
Technically the whole point of the stoeplitz API is that it's symmetric,
meaning that the order of addresses and ports doesn't matter and will produce
the same hash value.

Coverity CID 1501717
ok dlg@


# 1.111 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.110 29-Nov-2019 nayden

add __func__ to panic() and printf() calls in sys/netinet6/*
ok benno@ mortimer@


# 1.109 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.108 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.123 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.122 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.121 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.120 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.119 08-Aug-2022 bluhm

To make protocol input functions MP safe, internet PCB need protection.
Use their reference counter in more places.
The in_pcb lookup functions hold the PCBs in hash tables protected
by table->inpt_mtx mutex. Whenever a result is returned, increment
the ref count before releasing the mutex. Then the inp can be used
as long as neccessary. Unref it at the end of all functions that
call in_pcb lookup.
As a shortcut, pf may also hold a reference to the PCB. When
pf_inp_lookup() returns it, it also incements the ref count and the
caller can handle it like the inp from table lookup.
OK sashan@


# 1.118 06-Aug-2022 bluhm

Clean up the netlock macros. Merge NET_RLOCK_IN_SOFTNET and
NET_RLOCK_IN_IOCTL, which have the same implementation. The R and
W are hard to see, call the new macro NET_LOCK_SHARED. Rename the
opposite assertion from NET_ASSERT_WLOCKED to NET_ASSERT_LOCKED_EXCLUSIVE.
Update some outdated comments about net locking.
OK mpi@ mvs@


# 1.117 14-Apr-2022 claudio

Relax address availability check for multicast binds.

While it makes sense to limit bind(2) of unicast addresses that overlap
each other to be all from the same UID (like 0.0.0.0:53 and 127.0.0.1:53)
it makes little sense for multicast. Multicast is delivered to all sockets
that match so there is no risk of someone stealing traffic from someone
else. This should hopefully help with mDNS as reported by robert@
OK deraadt@ bluhm@


Revision tags: OPENBSD_7_1_BASE
# 1.116 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.115 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.114 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.113 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.112 11-Feb-2021 patrick

Swap faddr/laddr and fport/lport arguments in call to stoeplitz_ipXport().
Technically the whole point of the stoeplitz API is that it's symmetric,
meaning that the order of addresses and ports doesn't matter and will produce
the same hash value.

Coverity CID 1501717
ok dlg@


# 1.111 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.110 29-Nov-2019 nayden

add __func__ to panic() and printf() calls in sys/netinet6/*
ok benno@ mortimer@


# 1.109 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.108 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.121 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.120 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.119 08-Aug-2022 bluhm

To make protocol input functions MP safe, internet PCB need protection.
Use their reference counter in more places.
The in_pcb lookup functions hold the PCBs in hash tables protected
by table->inpt_mtx mutex. Whenever a result is returned, increment
the ref count before releasing the mutex. Then the inp can be used
as long as neccessary. Unref it at the end of all functions that
call in_pcb lookup.
As a shortcut, pf may also hold a reference to the PCB. When
pf_inp_lookup() returns it, it also incements the ref count and the
caller can handle it like the inp from table lookup.
OK sashan@


# 1.118 06-Aug-2022 bluhm

Clean up the netlock macros. Merge NET_RLOCK_IN_SOFTNET and
NET_RLOCK_IN_IOCTL, which have the same implementation. The R and
W are hard to see, call the new macro NET_LOCK_SHARED. Rename the
opposite assertion from NET_ASSERT_WLOCKED to NET_ASSERT_LOCKED_EXCLUSIVE.
Update some outdated comments about net locking.
OK mpi@ mvs@


# 1.117 14-Apr-2022 claudio

Relax address availability check for multicast binds.

While it makes sense to limit bind(2) of unicast addresses that overlap
each other to be all from the same UID (like 0.0.0.0:53 and 127.0.0.1:53)
it makes little sense for multicast. Multicast is delivered to all sockets
that match so there is no risk of someone stealing traffic from someone
else. This should hopefully help with mDNS as reported by robert@
OK deraadt@ bluhm@


Revision tags: OPENBSD_7_1_BASE
# 1.116 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.115 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.114 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.113 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.112 11-Feb-2021 patrick

Swap faddr/laddr and fport/lport arguments in call to stoeplitz_ipXport().
Technically the whole point of the stoeplitz API is that it's symmetric,
meaning that the order of addresses and ports doesn't matter and will produce
the same hash value.

Coverity CID 1501717
ok dlg@


# 1.111 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.110 29-Nov-2019 nayden

add __func__ to panic() and printf() calls in sys/netinet6/*
ok benno@ mortimer@


# 1.109 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.108 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.120 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.119 08-Aug-2022 bluhm

To make protocol input functions MP safe, internet PCB need protection.
Use their reference counter in more places.
The in_pcb lookup functions hold the PCBs in hash tables protected
by table->inpt_mtx mutex. Whenever a result is returned, increment
the ref count before releasing the mutex. Then the inp can be used
as long as neccessary. Unref it at the end of all functions that
call in_pcb lookup.
As a shortcut, pf may also hold a reference to the PCB. When
pf_inp_lookup() returns it, it also incements the ref count and the
caller can handle it like the inp from table lookup.
OK sashan@


# 1.118 06-Aug-2022 bluhm

Clean up the netlock macros. Merge NET_RLOCK_IN_SOFTNET and
NET_RLOCK_IN_IOCTL, which have the same implementation. The R and
W are hard to see, call the new macro NET_LOCK_SHARED. Rename the
opposite assertion from NET_ASSERT_WLOCKED to NET_ASSERT_LOCKED_EXCLUSIVE.
Update some outdated comments about net locking.
OK mpi@ mvs@


# 1.117 14-Apr-2022 claudio

Relax address availability check for multicast binds.

While it makes sense to limit bind(2) of unicast addresses that overlap
each other to be all from the same UID (like 0.0.0.0:53 and 127.0.0.1:53)
it makes little sense for multicast. Multicast is delivered to all sockets
that match so there is no risk of someone stealing traffic from someone
else. This should hopefully help with mDNS as reported by robert@
OK deraadt@ bluhm@


Revision tags: OPENBSD_7_1_BASE
# 1.116 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.115 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.114 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.113 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.112 11-Feb-2021 patrick

Swap faddr/laddr and fport/lport arguments in call to stoeplitz_ipXport().
Technically the whole point of the stoeplitz API is that it's symmetric,
meaning that the order of addresses and ports doesn't matter and will produce
the same hash value.

Coverity CID 1501717
ok dlg@


# 1.111 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.110 29-Nov-2019 nayden

add __func__ to panic() and printf() calls in sys/netinet6/*
ok benno@ mortimer@


# 1.109 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.108 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.119 08-Aug-2022 bluhm

To make protocol input functions MP safe, internet PCB need protection.
Use their reference counter in more places.
The in_pcb lookup functions hold the PCBs in hash tables protected
by table->inpt_mtx mutex. Whenever a result is returned, increment
the ref count before releasing the mutex. Then the inp can be used
as long as neccessary. Unref it at the end of all functions that
call in_pcb lookup.
As a shortcut, pf may also hold a reference to the PCB. When
pf_inp_lookup() returns it, it also incements the ref count and the
caller can handle it like the inp from table lookup.
OK sashan@


# 1.118 06-Aug-2022 bluhm

Clean up the netlock macros. Merge NET_RLOCK_IN_SOFTNET and
NET_RLOCK_IN_IOCTL, which have the same implementation. The R and
W are hard to see, call the new macro NET_LOCK_SHARED. Rename the
opposite assertion from NET_ASSERT_WLOCKED to NET_ASSERT_LOCKED_EXCLUSIVE.
Update some outdated comments about net locking.
OK mpi@ mvs@


# 1.117 14-Apr-2022 claudio

Relax address availability check for multicast binds.

While it makes sense to limit bind(2) of unicast addresses that overlap
each other to be all from the same UID (like 0.0.0.0:53 and 127.0.0.1:53)
it makes little sense for multicast. Multicast is delivered to all sockets
that match so there is no risk of someone stealing traffic from someone
else. This should hopefully help with mDNS as reported by robert@
OK deraadt@ bluhm@


Revision tags: OPENBSD_7_1_BASE
# 1.116 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.115 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.114 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.113 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.112 11-Feb-2021 patrick

Swap faddr/laddr and fport/lport arguments in call to stoeplitz_ipXport().
Technically the whole point of the stoeplitz API is that it's symmetric,
meaning that the order of addresses and ports doesn't matter and will produce
the same hash value.

Coverity CID 1501717
ok dlg@


# 1.111 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.110 29-Nov-2019 nayden

add __func__ to panic() and printf() calls in sys/netinet6/*
ok benno@ mortimer@


# 1.109 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.108 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.118 06-Aug-2022 bluhm

Clean up the netlock macros. Merge NET_RLOCK_IN_SOFTNET and
NET_RLOCK_IN_IOCTL, which have the same implementation. The R and
W are hard to see, call the new macro NET_LOCK_SHARED. Rename the
opposite assertion from NET_ASSERT_WLOCKED to NET_ASSERT_LOCKED_EXCLUSIVE.
Update some outdated comments about net locking.
OK mpi@ mvs@


# 1.117 14-Apr-2022 claudio

Relax address availability check for multicast binds.

While it makes sense to limit bind(2) of unicast addresses that overlap
each other to be all from the same UID (like 0.0.0.0:53 and 127.0.0.1:53)
it makes little sense for multicast. Multicast is delivered to all sockets
that match so there is no risk of someone stealing traffic from someone
else. This should hopefully help with mDNS as reported by robert@
OK deraadt@ bluhm@


Revision tags: OPENBSD_7_1_BASE
# 1.116 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.115 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.114 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.113 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.112 11-Feb-2021 patrick

Swap faddr/laddr and fport/lport arguments in call to stoeplitz_ipXport().
Technically the whole point of the stoeplitz API is that it's symmetric,
meaning that the order of addresses and ports doesn't matter and will produce
the same hash value.

Coverity CID 1501717
ok dlg@


# 1.111 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.110 29-Nov-2019 nayden

add __func__ to panic() and printf() calls in sys/netinet6/*
ok benno@ mortimer@


# 1.109 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.108 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.117 14-Apr-2022 claudio

Relax address availability check for multicast binds.

While it makes sense to limit bind(2) of unicast addresses that overlap
each other to be all from the same UID (like 0.0.0.0:53 and 127.0.0.1:53)
it makes little sense for multicast. Multicast is delivered to all sockets
that match so there is no risk of someone stealing traffic from someone
else. This should hopefully help with mDNS as reported by robert@
OK deraadt@ bluhm@


Revision tags: OPENBSD_7_1_BASE
# 1.116 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.115 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.114 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.113 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.112 11-Feb-2021 patrick

Swap faddr/laddr and fport/lport arguments in call to stoeplitz_ipXport().
Technically the whole point of the stoeplitz API is that it's symmetric,
meaning that the order of addresses and ports doesn't matter and will produce
the same hash value.

Coverity CID 1501717
ok dlg@


# 1.111 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.110 29-Nov-2019 nayden

add __func__ to panic() and printf() calls in sys/netinet6/*
ok benno@ mortimer@


# 1.109 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.108 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.116 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.115 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.114 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.113 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.112 11-Feb-2021 patrick

Swap faddr/laddr and fport/lport arguments in call to stoeplitz_ipXport().
Technically the whole point of the stoeplitz API is that it's symmetric,
meaning that the order of addresses and ports doesn't matter and will produce
the same hash value.

Coverity CID 1501717
ok dlg@


# 1.111 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.110 29-Nov-2019 nayden

add __func__ to panic() and printf() calls in sys/netinet6/*
ok benno@ mortimer@


# 1.109 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.108 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.115 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.114 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.113 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.112 11-Feb-2021 patrick

Swap faddr/laddr and fport/lport arguments in call to stoeplitz_ipXport().
Technically the whole point of the stoeplitz API is that it's symmetric,
meaning that the order of addresses and ports doesn't matter and will produce
the same hash value.

Coverity CID 1501717
ok dlg@


# 1.111 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.110 29-Nov-2019 nayden

add __func__ to panic() and printf() calls in sys/netinet6/*
ok benno@ mortimer@


# 1.109 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.108 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.113 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.112 11-Feb-2021 patrick

Swap faddr/laddr and fport/lport arguments in call to stoeplitz_ipXport().
Technically the whole point of the stoeplitz API is that it's symmetric,
meaning that the order of addresses and ports doesn't matter and will produce
the same hash value.

Coverity CID 1501717
ok dlg@


# 1.111 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.110 29-Nov-2019 nayden

add __func__ to panic() and printf() calls in sys/netinet6/*
ok benno@ mortimer@


# 1.109 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.108 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.112 11-Feb-2021 patrick

Swap faddr/laddr and fport/lport arguments in call to stoeplitz_ipXport().
Technically the whole point of the stoeplitz API is that it's symmetric,
meaning that the order of addresses and ports doesn't matter and will produce
the same hash value.

Coverity CID 1501717
ok dlg@


# 1.111 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.110 29-Nov-2019 nayden

add __func__ to panic() and printf() calls in sys/netinet6/*
ok benno@ mortimer@


# 1.109 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.108 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.111 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_7_BASE OPENBSD_6_8_BASE
# 1.110 29-Nov-2019 nayden

add __func__ to panic() and printf() calls in sys/netinet6/*
ok benno@ mortimer@


# 1.109 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.108 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.110 29-Nov-2019 nayden

add __func__ to panic() and printf() calls in sys/netinet6/*
ok benno@ mortimer@


# 1.109 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.108 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.109 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE OPENBSD_6_6_BASE
# 1.108 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.108 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.107 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.106 11-Sep-2018 bluhm

Convert inetctlerrmap to u_char like inet6ctlerrmap. That is also
what FreeBSD does. Remove old #if 0 version of inet6ctlerrmap.
OK mpi@


# 1.105 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


# 1.104 14-Jun-2018 bluhm

Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just call
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@


# 1.103 07-Jun-2018 bluhm

The global zero addresses must not change, mark them constant.
OK tb@ visa@


# 1.102 03-Jun-2018 bluhm

Use variable names for rtable and rdomain consistently in the in_pcb
functions.
discussed with and OK mpi@ visa@


# 1.101 03-Jun-2018 bluhm

Consistently call the inpcb table parameter "table" in in6_pcbnotify().
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.


Revision tags: OPENBSD_6_2_BASE
# 1.100 11-Aug-2017 bluhm

Validate sockaddr from userland in central functions. This results
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@


# 1.99 04-Aug-2017 bluhm

The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupel
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@


# 1.98 13-May-2017 bluhm

Do not check for locally bound mapped addresses in in6_pcbconnect(),
this is done during bind(2) in in6_pcbaddrisavail().
OK mpi@


Revision tags: OPENBSD_6_1_BASE
# 1.97 07-Mar-2017 bluhm

When the inpcb queue and hash lists are traversed or modified we
need netlock. Remove the obsolete splnet.
OK mpi@


# 1.96 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.95 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.94 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.93 05-Jul-2016 mpi

Expand IN6_IFF_NOTREADY, ok bluhm@


# 1.92 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.91 05-Apr-2016 vgross

Move reserved port checks from in(6)_pcbaddrisavail() to in_pcbbind().
Kill old comments while at it.

Ok mpi@ bluhm@


# 1.90 30-Mar-2016 vgross

Use in6_pcbhashlookup() in in6_pcbconnect(). We don't need in_pcblookup()
broad search and in_pcbconnect() already uses in_pcbhashlookup().

ok bluhm@ mpi@ jca@


# 1.89 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.88 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


# 1.87 20-Mar-2016 jca

Revert, missing decl for in6_pcbaddrisavail() breaks kernel build.

Spotted by deraadt@


# 1.86 19-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output().

Ok jca@ bluhm@


# 1.85 12-Mar-2016 vgross

Add checks on overlapping IPv6 sockets ownership

ok mpi@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.84 18-Dec-2015 vgross

branches: 1.84.2;
Fix SO_REUSE* flags effects when binding multicast addresses. No
regression observed on avahi.

ok benno@


# 1.83 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.82 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.81 20-Oct-2015 deraadt

At guenther's suggestion replace dnssocket() with a SOCK_DNS flag on
socket(). Without pledge, all other socket behaviours become permitted,
except this one case: connect/send* only works to *:53. In pledge mode,
a very few are further restricted. Some backwards compatibility for
the dnssocket/dnsconnect calls will remain in the tree temporarily so
that people can build through the transition.
ok tedu guenther semarie


# 1.80 19-Oct-2015 mpi

Stop checking for RTF_UP directly, call rtisvalid(9) instead.

While here add two missing ``rtableid'' checks in in6_selectsrc().

ok bluhm@


# 1.79 19-Oct-2015 vgross

deduplicate in[6]_pcbbind() port scan loop.

ok mpi@


# 1.78 18-Oct-2015 deraadt

Add two new system calls: dnssocket() and dnsconnect(). This creates a
SS_DNS tagged socket which has limited functionality (for example, you
cannot accept on them...) The libc resolver will switch to using these,
therefore pledge can identify a DNS transaction better.
ok tedu guenther kettenis beck and others


# 1.77 15-Oct-2015 vgross

in6_pcbconnect() returns EADDRNOTAVAIL when
all the ports in the range portfirst .. portlast
are in use.

ok millert@, mpi@


# 1.76 09-Oct-2015 deraadt

Rename tame() to pledge(). This fairly interface has evolved to be more
strict than anticipated. It allows a programmer to pledge/promise/covenant
that their program will operate within an easily defined subset of the
Unix environment, or it pays the price.


# 1.75 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


# 1.74 11-Sep-2015 deraadt

Convert _TM_ flags to TAME_ flags, collapsing the entire mapping
layer because the strings select the right options. Mechanical
conversion.
ok guenther


# 1.73 11-Sep-2015 guenther

Only include <sys/tame.h> in the .c files that need it

ok deraadt@ miod@


# 1.72 11-Sep-2015 claudio

in6_embedscope() needs to lose some weight. Remove the last argument.
In all but two calls NULL is passed and in the other 2 cases the ifp
is only used to maybe feed it to in6_selecthlim() to select the hoplimit
for the link. Since in6_embedscope() only works on link-local addresses
it does not matter what hop limit we select since the destination is
directly reachable.
OK florian@ mpi@


# 1.71 10-Sep-2015 claudio

It is time to put inet6 on a diet. Use the flensing knife and cut out
the 3rd argument of in6_recoverscope() and make it return void.
OK dlg@ mikeb@


# 1.70 22-Aug-2015 deraadt

Move to tame(int flags, char *paths[]) API/ABI.

The pathlist is a whitelist of dirs and files; anything else returns ENOENT.
Recommendation is to use a narrowly defined list. Also add TAME_FATTR, which
permits explicit change operations against "struct stat" fields. Some
other TAME_ flags are refined slightly.

Not cranking libc now, since nothing commited in base uses this and the
timing is uncomfortable for others. Discussed with many; thanks for a
few bug fixes from semarie, doug, guenther.
ok guenther


Revision tags: OPENBSD_5_8_BASE
# 1.69 19-Jul-2015 deraadt

branches: 1.69.4;
tame(2) is a subsystem which restricts programs into a "reduced feature
operating model". This is the kernel component; various changes should
proceed in-tree for a while before userland programs start using it.
ok miod, discussions and help from many


# 1.68 08-Jun-2015 krw

More damned eye searing whitespace. No change to .o files.


Revision tags: OPENBSD_5_7_BASE
# 1.67 05-Dec-2014 mpi

branches: 1.67.2;
Explicitly include <net/if_var.h> instead of pulling it in <net/if.h>.

ok mikeb@, krw@, bluhm@, tedu@


# 1.66 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


# 1.65 21-Aug-2014 mpi

Misleading comments about splnet().


Revision tags: OPENBSD_5_6_BASE
# 1.64 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.63 03-Jun-2014 mpi

Do not include <sys/malloc.h> where it is not needed.


# 1.62 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.61 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.60 06-Apr-2014 chrisz

Remove redundant call to in{,6}_pcbbind() from tcp PRU_CONNECT.
Make sure that in_pcbbind() is called from in_pcbconnect() by KASSERTing that
local port == 0 implies an unspecified local address.

OK claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.59 08-Jan-2014 bluhm

Name the local variables for struct ifaddr consistently "ifa".
OK mikeb@


# 1.58 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.57 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.56 31-May-2013 bluhm

Remove a bunch of sockaddr_in6 pointer casts and replace others
with sin6tosa() or satosin6() inline functions. This allows the
compiler to check the types more strictly.
OK mpi@


# 1.55 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.54 10-Apr-2013 mpi

Remove various external variable declaration from sources files and
move them to the corresponding header with an appropriate comment if
necessary.

ok guenther@


# 1.53 28-Mar-2013 tedu

no need for a lot of code to include proc.h


# 1.52 25-Mar-2013 mpi

Substitute the handcrafted list of IPv6 addresses by a proper TAILQ.

ok bluhm@, mikeb@


# 1.51 04-Mar-2013 bluhm

Replace the cast to struct in6_ifaddr pointer with the ifatoia6() macro.
No binary change.
OK claudio@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.50 24-Nov-2011 sperreault

rdomain support for IPv6
ok mikeb


Revision tags: OPENBSD_4_6_BASE OPENBSD_4_7_BASE OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.49 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_5_BASE
# 1.48 23-Nov-2008 claudio

When accessing cached routes make sure the route is actually still valid.
Before accessing a ro_rt make sure the route is either freshly allocated or
RTF_UP is set. If not ro_rt should be freed and reallocated or at least no
info from the ro_rt should be considered valid.
This seems to solve the crashes seen by Felipe Alfaro Solana.
some sort of OK dlg@


Revision tags: OPENBSD_4_4_BASE
# 1.47 11-Jun-2008 mcbride

ANSIfy to sync with KAME. From Karl Sjodahl <dunceor@gmail.com>.

ok todd deraadt naddy bluhm


# 1.46 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.45 19-May-2008 markus

SO_BINDANY for ipv6; ok djm@


# 1.44 18-Apr-2008 djm

use arc4random_uniform() for random number requests that are not a
power of two.

use arc4random_bytes() when requesting more than a word of PRNG
output.

ok deraadt@


Revision tags: OPENBSD_3_8_BASE OPENBSD_3_9_BASE OPENBSD_4_0_BASE OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.43 24-Jun-2005 markus

simplify port allocation in pcb_bind(); based on freebsd; ok claudio henning


Revision tags: OPENBSD_3_5_BASE OPENBSD_3_6_BASE OPENBSD_3_7_BASE SMP_SYNC_A SMP_SYNC_B
# 1.42 06-Feb-2004 itojun

permit IPv6-only operation (permit AF_INET6 bind(2) without IPv4 address).
found by todd fries. markus ok


# 1.41 05-Feb-2004 itojun

remove never-to-be-used codepath (IPv4 mapped address). ok mcbride


# 1.40 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.39 21-Dec-2003 markus

use CIRCLEQ* for pcb's; ok deraadt, henning, mcbride, with help from canacar


# 1.38 04-Nov-2003 markus

don't call in_pcbrehash twice; ok itojun@


# 1.37 01-Oct-2003 itojun

use random number generator to generate IPv6 fragment ID/flowlabel.
cleanup IPv6 flowlabel handling. deraadt ok


# 1.36 28-Sep-2003 cloder

Correct off-by-ones with respect to PRC_NCMDS. Mostly from FreeBSD.
OK krw@, deraadt@


Revision tags: OPENBSD_3_4_BASE
# 1.35 15-Aug-2003 tedu

change arguments to suser. suser now takes the process, and a flags
argument. old cred only calls user suser_ucred. this will allow future
work to more flexibly implement the idea of a root process. looks like
something i saw in freebsd, but a little different.
use of suser_ucred vs suser in file system code should be looked at again,
for the moment semantics remain unchanged.
review and input from art@ testing and further review miod@


# 1.34 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_3_BASE UBC_SYNC_A
# 1.33 15-Mar-2003 deraadt

specifed -> specified


Revision tags: OPENBSD_3_2_BASE UBC_SYNC_B
# 1.32 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame


# 1.31 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.30 20-Aug-2002 itojun

more comment on bind(deprecated) handling


# 1.29 09-Jun-2002 itojun

correct getpeername(2).


Revision tags: OPENBSD_3_1_BASE
# 1.28 14-Mar-2002 millert

First round of __P removal in sys


# 1.27 21-Jan-2002 itojun

remove couple of #if 0'ed portion we will never use


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.26 05-Jun-2001 deraadt

branches: 1.26.4;
repair copyright notices for NRL & cmetz; cmetz


Revision tags: OPENBSD_2_9_BASE
# 1.25 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.24 16-Feb-2001 itojun

kill register declarations. to sync with kame better.


# 1.23 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


# 1.22 06-Jan-2001 itojun

prohibited binding to an anycast, notready, or detached IPv6 address.
sync with kame 1.46 -> 1.47


# 1.21 21-Dec-2000 itojun

correct ipv6 path mtu discovery.


Revision tags: OPENBSD_2_8_BASE
# 1.20 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.19 18-Jun-2000 itojun

KNF (sorry craig)


# 1.18 18-Jun-2000 itojun

remove now-unnecessary statement due to "for" logic clarfication.


# 1.17 18-Jun-2000 itojun

correct logic mistake in in6_pcbnotify, due to indentation.
will KNF it soon.


# 1.16 18-Jun-2000 itojun

use in6_recoverscope


# 1.15 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


# 1.14 03-Jun-2000 itojun

correctly handle ctlinput messages for IPv6.


# 1.13 28-May-2000 itojun

do not treat bind(2) with IPv4 mapped address in a special way.
old code fails to check for port number duplicate.
XXX should remove more IPv4 mapped code


Revision tags: OPENBSD_2_7_BASE
# 1.12 27-Apr-2000 itojun

avoid infinite loop in in{6,}_pcbnotify (can occurs on family mismatch)


# 1.11 21-Apr-2000 itojun

NRL pcb issue; inp_{f,l}addr{,6} is a union so we need to be sure about
af match.
- do not touch IPv4 pcb entries on in6_pcbnotify.
- do not touch IPv6 pcb entries on in_pcbnotify.


# 1.10 28-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
- nuke xxCTL_VARS #define, they are for BSDI.
- disable SIOCSIFDSTADDR_IN6/SIOCSIFNETMASK_IN6 ioctl, they do not fit
IPv6 model where multiple address on interface is normal.
(kernel side supports them for a while for backward compat,
the support will be nuked shortly)
- introduce "default outgoing interface" (for spec conformance in very
rare case)


Revision tags: SMP_BASE
# 1.9 07-Feb-2000 itojun

branches: 1.9.2;
fix include file path related to ip6.


# 1.8 10-Dec-1999 angelos

Add RCS tags, remove unused header files and code, remove a few
unnecessary ifdefs...


# 1.7 08-Dec-1999 angelos

Removed about 24KB of ifdef'ed code. It's nice to be able to see what
other OSes do, but not if I can't read our code.


Revision tags: kame_19991208
# 1.6 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.5 24-Mar-1999 cmetz

Replace 'in6a_words' (old NRL convention) with 's6_addr32' (new BSDI et al.
convention that is more common and more specific as to the access size)


# 1.4 09-Mar-1999 cmetz

Demangled the INET6 stuff so as not to require any extra options and not to
be mutually exclusive with the IPSEC option.


# 1.3 24-Feb-1999 cmetz

Synchronized changes needed to integrate into OpenBSD with the NRL source
tree so we can have a unified netinet6 directory.


# 1.2 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


# 1.1 06-Jan-1999 deraadt

first few files of NRL ipv6. This NRL release was officially exported
to me by US DOD officials, with the crypto already removed.