History log of /openbsd-current/sys/netinet/in_pcb.h
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1.157 19-Apr-2024 bluhm

Merge IPv4 and IPv6 options in inpcb.

A internet PCB has either inp_options or inp_outputopts6. Put them
into a common anonymous union.

OK mvs@ kn@


# 1.156 17-Apr-2024 bluhm

Use struct ipsec_level within inpcb.

Instead of passing around u_char[4], introduce struct ipsec_level
that contains 4 ipsec levels. This provides better type safety.
The embedding struct inpcb is globally visible for netstat(1), so
put struct ipsec_level outside of #ifdef _KERNEL.

OK deraadt@ mvs@


# 1.155 15-Apr-2024 bluhm

Delete unused inp_csumoffset define.

OK mvs@


# 1.154 22-Mar-2024 bluhm

Remove padding from union inpaddru.

Alignment of IPv4 address with lower part of IPv6 address looks
like a leftover from times when IPv6 compatible addresses should
contain IPv4 addreses. Better use a simple union for both IPv4 and
IPv6 addresses like everywhere else. Use this type also for common
zero address.

OK mvs@


# 1.153 22-Mar-2024 bluhm

Make local port which is bound during connect(2) unique per laddr.

in_pcbconnect() did not pass down the address it got from in_pcbselsrc()
to in_pcbpickport(). As a consequence local port numbers selected
during connect(2) were globally unique although they belong to
different addresses. This strict uniqueness is not necessary and
wastes usable ports for outgoing connections.

To solve this, pass ina from in_pcbconnect() to in_pcbbind_locked().
This does not interfere how wildcard sockets are matched with
specific sockets during bind(2). It only allows non-wildcard sockets
to share a local port during connect(2).

OK mvs@ deraadt@


Revision tags: OPENBSD_7_5_BASE
# 1.152 13-Feb-2024 bluhm

Merge struct route and struct route_in6.

Use a common struct route for both inet and inet6. Unfortunately
struct sockaddr is shorter than sockaddr_in6, so netinet/in.h has
to be exposed from net/route.h. Struct route has to be bsd visible
for userland as netstat kvm code inspects inp_route. Internet PCB
and TCP SYN cache can use a plain struct route now. All specific
sockaddr types for inet and inet6 are embeded there.

OK claudio@


# 1.151 11-Feb-2024 bluhm

Remove include netinet6/ip6_var.h from netinet/in_pcb.h.

OK mvs@


# 1.150 31-Jan-2024 bluhm

Split in_pcbrtentry() and in6_pcbrtentry() based on INP_IPV6.

Splitting the IPv6 code into a separate function results in less
#ifdef INET6. Also struct route_in6 *ro in in6_pcbrtentry() is of
the correct type and in_pcbrtentry() does not rely on the fact that
inp_route and inp_route6 are pointers to the same union.

OK kn@ claudio@


# 1.149 28-Jan-2024 bluhm

Use more specific sockaddr type for inpcb notify.

in_pcbnotifyall() is an IPv4 only function. All callers check that
sockaddr dst is in fact a sockaddr_in. Pass the more spcific type
and remove the runtime check at beginning of in_pcbnotifyall().
Use const sockaddr_in in in_pcbnotifyall() and const sockaddr_in6
in6_pcbnotify() as dst parameter.

OK millert@


# 1.148 09-Jan-2024 bluhm

Convert some struct inpcb parameter to const pointer.

OK millert@


# 1.147 03-Jan-2024 bluhm

Run connect(2) in parallel within inet doamin.

This unlocks soconnect() for UDP, rip, rip6 and divert. It takes
shared net lock in combination with per socket lock. TCP and GRE
still use exclusive net lock when connecting.

OK mvs@


# 1.146 01-Jan-2024 bluhm

Protect link between pf and inp with mutex.

Introduce global mutex to protect the pointers between pf state key
and internet PCB. Then in_pcbdisconnect() and in_pcbdetach() do
not need exclusive netlock anymore. Use a bunch of read once
unlocked access to reduce performance impact.

OK sashan@


# 1.145 18-Dec-2023 bluhm

Run bind(2) system call in parallel.

For protocols that care about locking, use the shared net lock to
call sobind(). Use the per socket rwlock together with shared net
lock. This affects protocols UDP, raw IP, and divert. Move the
inpcb mutex locking into soreceive(), it is only used there. Add
a comment to describe the current inmplementation of inpcb locking.

OK mvs@ sashan@


# 1.144 15-Dec-2023 bluhm

Use inpcb table mutex to set addresses.

Protect all remaining write access to inp_faddr and inp_laddr with
inpcb table mutex. Document inpcb locking for foreign and local
address and port and routing table id. Reading will be made MP
safe by adding per socket rw-locks in a next step.

OK sashan@ mvs@


# 1.143 07-Dec-2023 bluhm

Inpcb table mutex protects addr and port during bind(2) and connect(2).

in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() have to set
addresses and ports within the same critical section as the inpcb
hash table calculation. Also lookup and address selection have to
be protected to avoid bindings and connections that are not unique.

For that in_pcbpickport() and in_pcbbind_locked() expect that the
table mutex is already taken. The functions in_pcblookup_lock(),
in_pcblookup_local_lock(), and in_pcbaddrisavail_lock() grab the
mutex iff the lock parameter is IN_PCBLOCK_GRAB. Otherwise the
parameter is IN_PCBLOCK_HOLD has the lock has to be taken already.
Note that in_pcblookup_lock() and in_pcblookup_local() return an
inp with increased reference iff they take and release the lock.
Otherwise the caller protects the life time of the inp.

This gives enough flexibility that in_pcbbind() and in_pcbconnect()
can hold the table mutex when they need it. The public inpcb API
does not change.

OK sashan@ mvs@


# 1.142 03-Dec-2023 bluhm

Use INP_IPV6 flag instead of sotopf().

During initialization in_pcballoc() sets INP_IPV6 once to avoid
reaching through inp_socket->so_proto->pr_domain->dom_family. Use
this flag consistently.

OK sashan@ mvs@


# 1.141 01-Dec-2023 bluhm

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@


# 1.140 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.139 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


# 1.138 26-Nov-2023 bluhm

Remove inp parameter from ip_output().

ip_output() received inp as parameter. This is only used to lookup
the IPsec level of the socket. Reasoning about MP locking is much
easier if only relevant data is passed around. Convert ip_output()
to receive constant inp_seclevel as argument and mark it as protected
by net lock.

OK mvs@


# 1.137 12-Nov-2023 bluhm

Declare global variable zeroin46_addr as const.

OK mvs@ jca@


Revision tags: OPENBSD_7_4_BASE
# 1.136 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_3_BASE
# 1.135 03-Oct-2022 bluhm

System calls should not fail due to temporary memory shortage in
malloc(9) or pool_get(9).
Pass down a wait flag to pru_attach(). During syscall socket(2)
it is ok to wait, this logic was missing for internet pcb. Pfkey
and route sockets were already waiting.
sonewconn() must not wait when called during TCP 3-way handshake.
This logic has been preserved. Unix domain stream socket connect(2)
can wait until the other side has created the socket to accept.
OK mvs@


Revision tags: OPENBSD_7_2_BASE
# 1.134 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.133 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.132 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.131 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.130 21-Aug-2022 bluhm

Introduce a mutex per inpcb to serialize access to socket receive
buffer. Later it may be used to protect more of the PCB or socket.
In divert input replace the kernel lock with this mutex.
OK mvs@


# 1.129 15-May-2022 dlg

have in_pcbselsrc copy the selected address to memory provided by the caller.

having it return a pointer to something that has a lifetime managed
by a lock without accounting for it or taking a reference count or
anything like that is asking for trouble. copying the address to
caller provded memory while still inside the lock is a lot safer.

discussed with visa@
ok bluhm@ claudio@


Revision tags: OPENBSD_7_1_BASE
# 1.128 21-Mar-2022 bluhm

For multicast and broadcast packets udp_input() traverses the loop
of all UDP PCBs. From there it calls udp_sbappend() while holding
the UDP table mutex. This ends in sorwakeup() where we finally
grab the kernel lock while holding a mutex. Witness detects this
misuse.
Use the same solution as for PCB notify. Collect the affected PCBs
in a temporary list. The list is protected by exclusive net lock.
Reported-by: syzbot+7596cb96fb9f3c9d6f4f@syzkaller.appspotmail.com
OK sashan@


# 1.127 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.126 20-Mar-2022 bluhm

Include sys/mutex.h from netinet/in_pcb.h. Struct mutex will be
needed to make inpcb in kernel MP safe. To build sysctl and libkvm
based programs, we have to export it to userland.
OK claudio@


# 1.125 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.124 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.123 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.121 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_8_BASE
# 1.120 21-Jun-2020 dlg

knf: the inp_upcall line was too long.


# 1.119 21-Jun-2020 dlg

add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.

this is so protocols (eg, udp) can let things (eg, kernel support
for wireguard or vxlan or geneve) look at and possibly steal packets
before they get added to a socket buffer.

i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.


Revision tags: OPENBSD_6_7_BASE
# 1.118 13-Nov-2019 deraadt

Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasingly
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.156 17-Apr-2024 bluhm

Use struct ipsec_level within inpcb.

Instead of passing around u_char[4], introduce struct ipsec_level
that contains 4 ipsec levels. This provides better type safety.
The embedding struct inpcb is globally visible for netstat(1), so
put struct ipsec_level outside of #ifdef _KERNEL.

OK deraadt@ mvs@


# 1.155 15-Apr-2024 bluhm

Delete unused inp_csumoffset define.

OK mvs@


# 1.154 22-Mar-2024 bluhm

Remove padding from union inpaddru.

Alignment of IPv4 address with lower part of IPv6 address looks
like a leftover from times when IPv6 compatible addresses should
contain IPv4 addreses. Better use a simple union for both IPv4 and
IPv6 addresses like everywhere else. Use this type also for common
zero address.

OK mvs@


# 1.153 22-Mar-2024 bluhm

Make local port which is bound during connect(2) unique per laddr.

in_pcbconnect() did not pass down the address it got from in_pcbselsrc()
to in_pcbpickport(). As a consequence local port numbers selected
during connect(2) were globally unique although they belong to
different addresses. This strict uniqueness is not necessary and
wastes usable ports for outgoing connections.

To solve this, pass ina from in_pcbconnect() to in_pcbbind_locked().
This does not interfere how wildcard sockets are matched with
specific sockets during bind(2). It only allows non-wildcard sockets
to share a local port during connect(2).

OK mvs@ deraadt@


Revision tags: OPENBSD_7_5_BASE
# 1.152 13-Feb-2024 bluhm

Merge struct route and struct route_in6.

Use a common struct route for both inet and inet6. Unfortunately
struct sockaddr is shorter than sockaddr_in6, so netinet/in.h has
to be exposed from net/route.h. Struct route has to be bsd visible
for userland as netstat kvm code inspects inp_route. Internet PCB
and TCP SYN cache can use a plain struct route now. All specific
sockaddr types for inet and inet6 are embeded there.

OK claudio@


# 1.151 11-Feb-2024 bluhm

Remove include netinet6/ip6_var.h from netinet/in_pcb.h.

OK mvs@


# 1.150 31-Jan-2024 bluhm

Split in_pcbrtentry() and in6_pcbrtentry() based on INP_IPV6.

Splitting the IPv6 code into a separate function results in less
#ifdef INET6. Also struct route_in6 *ro in in6_pcbrtentry() is of
the correct type and in_pcbrtentry() does not rely on the fact that
inp_route and inp_route6 are pointers to the same union.

OK kn@ claudio@


# 1.149 28-Jan-2024 bluhm

Use more specific sockaddr type for inpcb notify.

in_pcbnotifyall() is an IPv4 only function. All callers check that
sockaddr dst is in fact a sockaddr_in. Pass the more spcific type
and remove the runtime check at beginning of in_pcbnotifyall().
Use const sockaddr_in in in_pcbnotifyall() and const sockaddr_in6
in6_pcbnotify() as dst parameter.

OK millert@


# 1.148 09-Jan-2024 bluhm

Convert some struct inpcb parameter to const pointer.

OK millert@


# 1.147 03-Jan-2024 bluhm

Run connect(2) in parallel within inet doamin.

This unlocks soconnect() for UDP, rip, rip6 and divert. It takes
shared net lock in combination with per socket lock. TCP and GRE
still use exclusive net lock when connecting.

OK mvs@


# 1.146 01-Jan-2024 bluhm

Protect link between pf and inp with mutex.

Introduce global mutex to protect the pointers between pf state key
and internet PCB. Then in_pcbdisconnect() and in_pcbdetach() do
not need exclusive netlock anymore. Use a bunch of read once
unlocked access to reduce performance impact.

OK sashan@


# 1.145 18-Dec-2023 bluhm

Run bind(2) system call in parallel.

For protocols that care about locking, use the shared net lock to
call sobind(). Use the per socket rwlock together with shared net
lock. This affects protocols UDP, raw IP, and divert. Move the
inpcb mutex locking into soreceive(), it is only used there. Add
a comment to describe the current inmplementation of inpcb locking.

OK mvs@ sashan@


# 1.144 15-Dec-2023 bluhm

Use inpcb table mutex to set addresses.

Protect all remaining write access to inp_faddr and inp_laddr with
inpcb table mutex. Document inpcb locking for foreign and local
address and port and routing table id. Reading will be made MP
safe by adding per socket rw-locks in a next step.

OK sashan@ mvs@


# 1.143 07-Dec-2023 bluhm

Inpcb table mutex protects addr and port during bind(2) and connect(2).

in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() have to set
addresses and ports within the same critical section as the inpcb
hash table calculation. Also lookup and address selection have to
be protected to avoid bindings and connections that are not unique.

For that in_pcbpickport() and in_pcbbind_locked() expect that the
table mutex is already taken. The functions in_pcblookup_lock(),
in_pcblookup_local_lock(), and in_pcbaddrisavail_lock() grab the
mutex iff the lock parameter is IN_PCBLOCK_GRAB. Otherwise the
parameter is IN_PCBLOCK_HOLD has the lock has to be taken already.
Note that in_pcblookup_lock() and in_pcblookup_local() return an
inp with increased reference iff they take and release the lock.
Otherwise the caller protects the life time of the inp.

This gives enough flexibility that in_pcbbind() and in_pcbconnect()
can hold the table mutex when they need it. The public inpcb API
does not change.

OK sashan@ mvs@


# 1.142 03-Dec-2023 bluhm

Use INP_IPV6 flag instead of sotopf().

During initialization in_pcballoc() sets INP_IPV6 once to avoid
reaching through inp_socket->so_proto->pr_domain->dom_family. Use
this flag consistently.

OK sashan@ mvs@


# 1.141 01-Dec-2023 bluhm

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@


# 1.140 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.139 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


# 1.138 26-Nov-2023 bluhm

Remove inp parameter from ip_output().

ip_output() received inp as parameter. This is only used to lookup
the IPsec level of the socket. Reasoning about MP locking is much
easier if only relevant data is passed around. Convert ip_output()
to receive constant inp_seclevel as argument and mark it as protected
by net lock.

OK mvs@


# 1.137 12-Nov-2023 bluhm

Declare global variable zeroin46_addr as const.

OK mvs@ jca@


Revision tags: OPENBSD_7_4_BASE
# 1.136 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_3_BASE
# 1.135 03-Oct-2022 bluhm

System calls should not fail due to temporary memory shortage in
malloc(9) or pool_get(9).
Pass down a wait flag to pru_attach(). During syscall socket(2)
it is ok to wait, this logic was missing for internet pcb. Pfkey
and route sockets were already waiting.
sonewconn() must not wait when called during TCP 3-way handshake.
This logic has been preserved. Unix domain stream socket connect(2)
can wait until the other side has created the socket to accept.
OK mvs@


Revision tags: OPENBSD_7_2_BASE
# 1.134 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.133 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.132 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.131 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.130 21-Aug-2022 bluhm

Introduce a mutex per inpcb to serialize access to socket receive
buffer. Later it may be used to protect more of the PCB or socket.
In divert input replace the kernel lock with this mutex.
OK mvs@


# 1.129 15-May-2022 dlg

have in_pcbselsrc copy the selected address to memory provided by the caller.

having it return a pointer to something that has a lifetime managed
by a lock without accounting for it or taking a reference count or
anything like that is asking for trouble. copying the address to
caller provded memory while still inside the lock is a lot safer.

discussed with visa@
ok bluhm@ claudio@


Revision tags: OPENBSD_7_1_BASE
# 1.128 21-Mar-2022 bluhm

For multicast and broadcast packets udp_input() traverses the loop
of all UDP PCBs. From there it calls udp_sbappend() while holding
the UDP table mutex. This ends in sorwakeup() where we finally
grab the kernel lock while holding a mutex. Witness detects this
misuse.
Use the same solution as for PCB notify. Collect the affected PCBs
in a temporary list. The list is protected by exclusive net lock.
Reported-by: syzbot+7596cb96fb9f3c9d6f4f@syzkaller.appspotmail.com
OK sashan@


# 1.127 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.126 20-Mar-2022 bluhm

Include sys/mutex.h from netinet/in_pcb.h. Struct mutex will be
needed to make inpcb in kernel MP safe. To build sysctl and libkvm
based programs, we have to export it to userland.
OK claudio@


# 1.125 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.124 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.123 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.121 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_8_BASE
# 1.120 21-Jun-2020 dlg

knf: the inp_upcall line was too long.


# 1.119 21-Jun-2020 dlg

add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.

this is so protocols (eg, udp) can let things (eg, kernel support
for wireguard or vxlan or geneve) look at and possibly steal packets
before they get added to a socket buffer.

i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.


Revision tags: OPENBSD_6_7_BASE
# 1.118 13-Nov-2019 deraadt

Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasingly
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.155 15-Apr-2024 bluhm

Delete unused inp_csumoffset define.

OK mvs@


# 1.154 22-Mar-2024 bluhm

Remove padding from union inpaddru.

Alignment of IPv4 address with lower part of IPv6 address looks
like a leftover from times when IPv6 compatible addresses should
contain IPv4 addreses. Better use a simple union for both IPv4 and
IPv6 addresses like everywhere else. Use this type also for common
zero address.

OK mvs@


# 1.153 22-Mar-2024 bluhm

Make local port which is bound during connect(2) unique per laddr.

in_pcbconnect() did not pass down the address it got from in_pcbselsrc()
to in_pcbpickport(). As a consequence local port numbers selected
during connect(2) were globally unique although they belong to
different addresses. This strict uniqueness is not necessary and
wastes usable ports for outgoing connections.

To solve this, pass ina from in_pcbconnect() to in_pcbbind_locked().
This does not interfere how wildcard sockets are matched with
specific sockets during bind(2). It only allows non-wildcard sockets
to share a local port during connect(2).

OK mvs@ deraadt@


Revision tags: OPENBSD_7_5_BASE
# 1.152 13-Feb-2024 bluhm

Merge struct route and struct route_in6.

Use a common struct route for both inet and inet6. Unfortunately
struct sockaddr is shorter than sockaddr_in6, so netinet/in.h has
to be exposed from net/route.h. Struct route has to be bsd visible
for userland as netstat kvm code inspects inp_route. Internet PCB
and TCP SYN cache can use a plain struct route now. All specific
sockaddr types for inet and inet6 are embeded there.

OK claudio@


# 1.151 11-Feb-2024 bluhm

Remove include netinet6/ip6_var.h from netinet/in_pcb.h.

OK mvs@


# 1.150 31-Jan-2024 bluhm

Split in_pcbrtentry() and in6_pcbrtentry() based on INP_IPV6.

Splitting the IPv6 code into a separate function results in less
#ifdef INET6. Also struct route_in6 *ro in in6_pcbrtentry() is of
the correct type and in_pcbrtentry() does not rely on the fact that
inp_route and inp_route6 are pointers to the same union.

OK kn@ claudio@


# 1.149 28-Jan-2024 bluhm

Use more specific sockaddr type for inpcb notify.

in_pcbnotifyall() is an IPv4 only function. All callers check that
sockaddr dst is in fact a sockaddr_in. Pass the more spcific type
and remove the runtime check at beginning of in_pcbnotifyall().
Use const sockaddr_in in in_pcbnotifyall() and const sockaddr_in6
in6_pcbnotify() as dst parameter.

OK millert@


# 1.148 09-Jan-2024 bluhm

Convert some struct inpcb parameter to const pointer.

OK millert@


# 1.147 03-Jan-2024 bluhm

Run connect(2) in parallel within inet doamin.

This unlocks soconnect() for UDP, rip, rip6 and divert. It takes
shared net lock in combination with per socket lock. TCP and GRE
still use exclusive net lock when connecting.

OK mvs@


# 1.146 01-Jan-2024 bluhm

Protect link between pf and inp with mutex.

Introduce global mutex to protect the pointers between pf state key
and internet PCB. Then in_pcbdisconnect() and in_pcbdetach() do
not need exclusive netlock anymore. Use a bunch of read once
unlocked access to reduce performance impact.

OK sashan@


# 1.145 18-Dec-2023 bluhm

Run bind(2) system call in parallel.

For protocols that care about locking, use the shared net lock to
call sobind(). Use the per socket rwlock together with shared net
lock. This affects protocols UDP, raw IP, and divert. Move the
inpcb mutex locking into soreceive(), it is only used there. Add
a comment to describe the current inmplementation of inpcb locking.

OK mvs@ sashan@


# 1.144 15-Dec-2023 bluhm

Use inpcb table mutex to set addresses.

Protect all remaining write access to inp_faddr and inp_laddr with
inpcb table mutex. Document inpcb locking for foreign and local
address and port and routing table id. Reading will be made MP
safe by adding per socket rw-locks in a next step.

OK sashan@ mvs@


# 1.143 07-Dec-2023 bluhm

Inpcb table mutex protects addr and port during bind(2) and connect(2).

in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() have to set
addresses and ports within the same critical section as the inpcb
hash table calculation. Also lookup and address selection have to
be protected to avoid bindings and connections that are not unique.

For that in_pcbpickport() and in_pcbbind_locked() expect that the
table mutex is already taken. The functions in_pcblookup_lock(),
in_pcblookup_local_lock(), and in_pcbaddrisavail_lock() grab the
mutex iff the lock parameter is IN_PCBLOCK_GRAB. Otherwise the
parameter is IN_PCBLOCK_HOLD has the lock has to be taken already.
Note that in_pcblookup_lock() and in_pcblookup_local() return an
inp with increased reference iff they take and release the lock.
Otherwise the caller protects the life time of the inp.

This gives enough flexibility that in_pcbbind() and in_pcbconnect()
can hold the table mutex when they need it. The public inpcb API
does not change.

OK sashan@ mvs@


# 1.142 03-Dec-2023 bluhm

Use INP_IPV6 flag instead of sotopf().

During initialization in_pcballoc() sets INP_IPV6 once to avoid
reaching through inp_socket->so_proto->pr_domain->dom_family. Use
this flag consistently.

OK sashan@ mvs@


# 1.141 01-Dec-2023 bluhm

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@


# 1.140 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.139 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


# 1.138 26-Nov-2023 bluhm

Remove inp parameter from ip_output().

ip_output() received inp as parameter. This is only used to lookup
the IPsec level of the socket. Reasoning about MP locking is much
easier if only relevant data is passed around. Convert ip_output()
to receive constant inp_seclevel as argument and mark it as protected
by net lock.

OK mvs@


# 1.137 12-Nov-2023 bluhm

Declare global variable zeroin46_addr as const.

OK mvs@ jca@


Revision tags: OPENBSD_7_4_BASE
# 1.136 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_3_BASE
# 1.135 03-Oct-2022 bluhm

System calls should not fail due to temporary memory shortage in
malloc(9) or pool_get(9).
Pass down a wait flag to pru_attach(). During syscall socket(2)
it is ok to wait, this logic was missing for internet pcb. Pfkey
and route sockets were already waiting.
sonewconn() must not wait when called during TCP 3-way handshake.
This logic has been preserved. Unix domain stream socket connect(2)
can wait until the other side has created the socket to accept.
OK mvs@


Revision tags: OPENBSD_7_2_BASE
# 1.134 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.133 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.132 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.131 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.130 21-Aug-2022 bluhm

Introduce a mutex per inpcb to serialize access to socket receive
buffer. Later it may be used to protect more of the PCB or socket.
In divert input replace the kernel lock with this mutex.
OK mvs@


# 1.129 15-May-2022 dlg

have in_pcbselsrc copy the selected address to memory provided by the caller.

having it return a pointer to something that has a lifetime managed
by a lock without accounting for it or taking a reference count or
anything like that is asking for trouble. copying the address to
caller provded memory while still inside the lock is a lot safer.

discussed with visa@
ok bluhm@ claudio@


Revision tags: OPENBSD_7_1_BASE
# 1.128 21-Mar-2022 bluhm

For multicast and broadcast packets udp_input() traverses the loop
of all UDP PCBs. From there it calls udp_sbappend() while holding
the UDP table mutex. This ends in sorwakeup() where we finally
grab the kernel lock while holding a mutex. Witness detects this
misuse.
Use the same solution as for PCB notify. Collect the affected PCBs
in a temporary list. The list is protected by exclusive net lock.
Reported-by: syzbot+7596cb96fb9f3c9d6f4f@syzkaller.appspotmail.com
OK sashan@


# 1.127 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.126 20-Mar-2022 bluhm

Include sys/mutex.h from netinet/in_pcb.h. Struct mutex will be
needed to make inpcb in kernel MP safe. To build sysctl and libkvm
based programs, we have to export it to userland.
OK claudio@


# 1.125 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.124 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.123 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.121 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_8_BASE
# 1.120 21-Jun-2020 dlg

knf: the inp_upcall line was too long.


# 1.119 21-Jun-2020 dlg

add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.

this is so protocols (eg, udp) can let things (eg, kernel support
for wireguard or vxlan or geneve) look at and possibly steal packets
before they get added to a socket buffer.

i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.


Revision tags: OPENBSD_6_7_BASE
# 1.118 13-Nov-2019 deraadt

Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasingly
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.154 22-Mar-2024 bluhm

Remove padding from union inpaddru.

Alignment of IPv4 address with lower part of IPv6 address looks
like a leftover from times when IPv6 compatible addresses should
contain IPv4 addreses. Better use a simple union for both IPv4 and
IPv6 addresses like everywhere else. Use this type also for common
zero address.

OK mvs@


# 1.153 22-Mar-2024 bluhm

Make local port which is bound during connect(2) unique per laddr.

in_pcbconnect() did not pass down the address it got from in_pcbselsrc()
to in_pcbpickport(). As a consequence local port numbers selected
during connect(2) were globally unique although they belong to
different addresses. This strict uniqueness is not necessary and
wastes usable ports for outgoing connections.

To solve this, pass ina from in_pcbconnect() to in_pcbbind_locked().
This does not interfere how wildcard sockets are matched with
specific sockets during bind(2). It only allows non-wildcard sockets
to share a local port during connect(2).

OK mvs@ deraadt@


Revision tags: OPENBSD_7_5_BASE
# 1.152 13-Feb-2024 bluhm

Merge struct route and struct route_in6.

Use a common struct route for both inet and inet6. Unfortunately
struct sockaddr is shorter than sockaddr_in6, so netinet/in.h has
to be exposed from net/route.h. Struct route has to be bsd visible
for userland as netstat kvm code inspects inp_route. Internet PCB
and TCP SYN cache can use a plain struct route now. All specific
sockaddr types for inet and inet6 are embeded there.

OK claudio@


# 1.151 11-Feb-2024 bluhm

Remove include netinet6/ip6_var.h from netinet/in_pcb.h.

OK mvs@


# 1.150 31-Jan-2024 bluhm

Split in_pcbrtentry() and in6_pcbrtentry() based on INP_IPV6.

Splitting the IPv6 code into a separate function results in less
#ifdef INET6. Also struct route_in6 *ro in in6_pcbrtentry() is of
the correct type and in_pcbrtentry() does not rely on the fact that
inp_route and inp_route6 are pointers to the same union.

OK kn@ claudio@


# 1.149 28-Jan-2024 bluhm

Use more specific sockaddr type for inpcb notify.

in_pcbnotifyall() is an IPv4 only function. All callers check that
sockaddr dst is in fact a sockaddr_in. Pass the more spcific type
and remove the runtime check at beginning of in_pcbnotifyall().
Use const sockaddr_in in in_pcbnotifyall() and const sockaddr_in6
in6_pcbnotify() as dst parameter.

OK millert@


# 1.148 09-Jan-2024 bluhm

Convert some struct inpcb parameter to const pointer.

OK millert@


# 1.147 03-Jan-2024 bluhm

Run connect(2) in parallel within inet doamin.

This unlocks soconnect() for UDP, rip, rip6 and divert. It takes
shared net lock in combination with per socket lock. TCP and GRE
still use exclusive net lock when connecting.

OK mvs@


# 1.146 01-Jan-2024 bluhm

Protect link between pf and inp with mutex.

Introduce global mutex to protect the pointers between pf state key
and internet PCB. Then in_pcbdisconnect() and in_pcbdetach() do
not need exclusive netlock anymore. Use a bunch of read once
unlocked access to reduce performance impact.

OK sashan@


# 1.145 18-Dec-2023 bluhm

Run bind(2) system call in parallel.

For protocols that care about locking, use the shared net lock to
call sobind(). Use the per socket rwlock together with shared net
lock. This affects protocols UDP, raw IP, and divert. Move the
inpcb mutex locking into soreceive(), it is only used there. Add
a comment to describe the current inmplementation of inpcb locking.

OK mvs@ sashan@


# 1.144 15-Dec-2023 bluhm

Use inpcb table mutex to set addresses.

Protect all remaining write access to inp_faddr and inp_laddr with
inpcb table mutex. Document inpcb locking for foreign and local
address and port and routing table id. Reading will be made MP
safe by adding per socket rw-locks in a next step.

OK sashan@ mvs@


# 1.143 07-Dec-2023 bluhm

Inpcb table mutex protects addr and port during bind(2) and connect(2).

in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() have to set
addresses and ports within the same critical section as the inpcb
hash table calculation. Also lookup and address selection have to
be protected to avoid bindings and connections that are not unique.

For that in_pcbpickport() and in_pcbbind_locked() expect that the
table mutex is already taken. The functions in_pcblookup_lock(),
in_pcblookup_local_lock(), and in_pcbaddrisavail_lock() grab the
mutex iff the lock parameter is IN_PCBLOCK_GRAB. Otherwise the
parameter is IN_PCBLOCK_HOLD has the lock has to be taken already.
Note that in_pcblookup_lock() and in_pcblookup_local() return an
inp with increased reference iff they take and release the lock.
Otherwise the caller protects the life time of the inp.

This gives enough flexibility that in_pcbbind() and in_pcbconnect()
can hold the table mutex when they need it. The public inpcb API
does not change.

OK sashan@ mvs@


# 1.142 03-Dec-2023 bluhm

Use INP_IPV6 flag instead of sotopf().

During initialization in_pcballoc() sets INP_IPV6 once to avoid
reaching through inp_socket->so_proto->pr_domain->dom_family. Use
this flag consistently.

OK sashan@ mvs@


# 1.141 01-Dec-2023 bluhm

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@


# 1.140 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.139 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


# 1.138 26-Nov-2023 bluhm

Remove inp parameter from ip_output().

ip_output() received inp as parameter. This is only used to lookup
the IPsec level of the socket. Reasoning about MP locking is much
easier if only relevant data is passed around. Convert ip_output()
to receive constant inp_seclevel as argument and mark it as protected
by net lock.

OK mvs@


# 1.137 12-Nov-2023 bluhm

Declare global variable zeroin46_addr as const.

OK mvs@ jca@


Revision tags: OPENBSD_7_4_BASE
# 1.136 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_3_BASE
# 1.135 03-Oct-2022 bluhm

System calls should not fail due to temporary memory shortage in
malloc(9) or pool_get(9).
Pass down a wait flag to pru_attach(). During syscall socket(2)
it is ok to wait, this logic was missing for internet pcb. Pfkey
and route sockets were already waiting.
sonewconn() must not wait when called during TCP 3-way handshake.
This logic has been preserved. Unix domain stream socket connect(2)
can wait until the other side has created the socket to accept.
OK mvs@


Revision tags: OPENBSD_7_2_BASE
# 1.134 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.133 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.132 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.131 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.130 21-Aug-2022 bluhm

Introduce a mutex per inpcb to serialize access to socket receive
buffer. Later it may be used to protect more of the PCB or socket.
In divert input replace the kernel lock with this mutex.
OK mvs@


# 1.129 15-May-2022 dlg

have in_pcbselsrc copy the selected address to memory provided by the caller.

having it return a pointer to something that has a lifetime managed
by a lock without accounting for it or taking a reference count or
anything like that is asking for trouble. copying the address to
caller provded memory while still inside the lock is a lot safer.

discussed with visa@
ok bluhm@ claudio@


Revision tags: OPENBSD_7_1_BASE
# 1.128 21-Mar-2022 bluhm

For multicast and broadcast packets udp_input() traverses the loop
of all UDP PCBs. From there it calls udp_sbappend() while holding
the UDP table mutex. This ends in sorwakeup() where we finally
grab the kernel lock while holding a mutex. Witness detects this
misuse.
Use the same solution as for PCB notify. Collect the affected PCBs
in a temporary list. The list is protected by exclusive net lock.
Reported-by: syzbot+7596cb96fb9f3c9d6f4f@syzkaller.appspotmail.com
OK sashan@


# 1.127 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.126 20-Mar-2022 bluhm

Include sys/mutex.h from netinet/in_pcb.h. Struct mutex will be
needed to make inpcb in kernel MP safe. To build sysctl and libkvm
based programs, we have to export it to userland.
OK claudio@


# 1.125 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.124 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.123 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.121 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_8_BASE
# 1.120 21-Jun-2020 dlg

knf: the inp_upcall line was too long.


# 1.119 21-Jun-2020 dlg

add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.

this is so protocols (eg, udp) can let things (eg, kernel support
for wireguard or vxlan or geneve) look at and possibly steal packets
before they get added to a socket buffer.

i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.


Revision tags: OPENBSD_6_7_BASE
# 1.118 13-Nov-2019 deraadt

Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasingly
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.152 13-Feb-2024 bluhm

Merge struct route and struct route_in6.

Use a common struct route for both inet and inet6. Unfortunately
struct sockaddr is shorter than sockaddr_in6, so netinet/in.h has
to be exposed from net/route.h. Struct route has to be bsd visible
for userland as netstat kvm code inspects inp_route. Internet PCB
and TCP SYN cache can use a plain struct route now. All specific
sockaddr types for inet and inet6 are embeded there.

OK claudio@


# 1.151 11-Feb-2024 bluhm

Remove include netinet6/ip6_var.h from netinet/in_pcb.h.

OK mvs@


# 1.150 31-Jan-2024 bluhm

Split in_pcbrtentry() and in6_pcbrtentry() based on INP_IPV6.

Splitting the IPv6 code into a separate function results in less
#ifdef INET6. Also struct route_in6 *ro in in6_pcbrtentry() is of
the correct type and in_pcbrtentry() does not rely on the fact that
inp_route and inp_route6 are pointers to the same union.

OK kn@ claudio@


# 1.149 28-Jan-2024 bluhm

Use more specific sockaddr type for inpcb notify.

in_pcbnotifyall() is an IPv4 only function. All callers check that
sockaddr dst is in fact a sockaddr_in. Pass the more spcific type
and remove the runtime check at beginning of in_pcbnotifyall().
Use const sockaddr_in in in_pcbnotifyall() and const sockaddr_in6
in6_pcbnotify() as dst parameter.

OK millert@


# 1.148 09-Jan-2024 bluhm

Convert some struct inpcb parameter to const pointer.

OK millert@


# 1.147 03-Jan-2024 bluhm

Run connect(2) in parallel within inet doamin.

This unlocks soconnect() for UDP, rip, rip6 and divert. It takes
shared net lock in combination with per socket lock. TCP and GRE
still use exclusive net lock when connecting.

OK mvs@


# 1.146 01-Jan-2024 bluhm

Protect link between pf and inp with mutex.

Introduce global mutex to protect the pointers between pf state key
and internet PCB. Then in_pcbdisconnect() and in_pcbdetach() do
not need exclusive netlock anymore. Use a bunch of read once
unlocked access to reduce performance impact.

OK sashan@


# 1.145 18-Dec-2023 bluhm

Run bind(2) system call in parallel.

For protocols that care about locking, use the shared net lock to
call sobind(). Use the per socket rwlock together with shared net
lock. This affects protocols UDP, raw IP, and divert. Move the
inpcb mutex locking into soreceive(), it is only used there. Add
a comment to describe the current inmplementation of inpcb locking.

OK mvs@ sashan@


# 1.144 15-Dec-2023 bluhm

Use inpcb table mutex to set addresses.

Protect all remaining write access to inp_faddr and inp_laddr with
inpcb table mutex. Document inpcb locking for foreign and local
address and port and routing table id. Reading will be made MP
safe by adding per socket rw-locks in a next step.

OK sashan@ mvs@


# 1.143 07-Dec-2023 bluhm

Inpcb table mutex protects addr and port during bind(2) and connect(2).

in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() have to set
addresses and ports within the same critical section as the inpcb
hash table calculation. Also lookup and address selection have to
be protected to avoid bindings and connections that are not unique.

For that in_pcbpickport() and in_pcbbind_locked() expect that the
table mutex is already taken. The functions in_pcblookup_lock(),
in_pcblookup_local_lock(), and in_pcbaddrisavail_lock() grab the
mutex iff the lock parameter is IN_PCBLOCK_GRAB. Otherwise the
parameter is IN_PCBLOCK_HOLD has the lock has to be taken already.
Note that in_pcblookup_lock() and in_pcblookup_local() return an
inp with increased reference iff they take and release the lock.
Otherwise the caller protects the life time of the inp.

This gives enough flexibility that in_pcbbind() and in_pcbconnect()
can hold the table mutex when they need it. The public inpcb API
does not change.

OK sashan@ mvs@


# 1.142 03-Dec-2023 bluhm

Use INP_IPV6 flag instead of sotopf().

During initialization in_pcballoc() sets INP_IPV6 once to avoid
reaching through inp_socket->so_proto->pr_domain->dom_family. Use
this flag consistently.

OK sashan@ mvs@


# 1.141 01-Dec-2023 bluhm

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@


# 1.140 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.139 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


# 1.138 26-Nov-2023 bluhm

Remove inp parameter from ip_output().

ip_output() received inp as parameter. This is only used to lookup
the IPsec level of the socket. Reasoning about MP locking is much
easier if only relevant data is passed around. Convert ip_output()
to receive constant inp_seclevel as argument and mark it as protected
by net lock.

OK mvs@


# 1.137 12-Nov-2023 bluhm

Declare global variable zeroin46_addr as const.

OK mvs@ jca@


Revision tags: OPENBSD_7_4_BASE
# 1.136 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_3_BASE
# 1.135 03-Oct-2022 bluhm

System calls should not fail due to temporary memory shortage in
malloc(9) or pool_get(9).
Pass down a wait flag to pru_attach(). During syscall socket(2)
it is ok to wait, this logic was missing for internet pcb. Pfkey
and route sockets were already waiting.
sonewconn() must not wait when called during TCP 3-way handshake.
This logic has been preserved. Unix domain stream socket connect(2)
can wait until the other side has created the socket to accept.
OK mvs@


Revision tags: OPENBSD_7_2_BASE
# 1.134 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.133 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.132 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.131 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.130 21-Aug-2022 bluhm

Introduce a mutex per inpcb to serialize access to socket receive
buffer. Later it may be used to protect more of the PCB or socket.
In divert input replace the kernel lock with this mutex.
OK mvs@


# 1.129 15-May-2022 dlg

have in_pcbselsrc copy the selected address to memory provided by the caller.

having it return a pointer to something that has a lifetime managed
by a lock without accounting for it or taking a reference count or
anything like that is asking for trouble. copying the address to
caller provded memory while still inside the lock is a lot safer.

discussed with visa@
ok bluhm@ claudio@


Revision tags: OPENBSD_7_1_BASE
# 1.128 21-Mar-2022 bluhm

For multicast and broadcast packets udp_input() traverses the loop
of all UDP PCBs. From there it calls udp_sbappend() while holding
the UDP table mutex. This ends in sorwakeup() where we finally
grab the kernel lock while holding a mutex. Witness detects this
misuse.
Use the same solution as for PCB notify. Collect the affected PCBs
in a temporary list. The list is protected by exclusive net lock.
Reported-by: syzbot+7596cb96fb9f3c9d6f4f@syzkaller.appspotmail.com
OK sashan@


# 1.127 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.126 20-Mar-2022 bluhm

Include sys/mutex.h from netinet/in_pcb.h. Struct mutex will be
needed to make inpcb in kernel MP safe. To build sysctl and libkvm
based programs, we have to export it to userland.
OK claudio@


# 1.125 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.124 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.123 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.121 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_8_BASE
# 1.120 21-Jun-2020 dlg

knf: the inp_upcall line was too long.


# 1.119 21-Jun-2020 dlg

add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.

this is so protocols (eg, udp) can let things (eg, kernel support
for wireguard or vxlan or geneve) look at and possibly steal packets
before they get added to a socket buffer.

i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.


Revision tags: OPENBSD_6_7_BASE
# 1.118 13-Nov-2019 deraadt

Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasingly
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.151 11-Feb-2024 bluhm

Remove include netinet6/ip6_var.h from netinet/in_pcb.h.

OK mvs@


# 1.150 31-Jan-2024 bluhm

Split in_pcbrtentry() and in6_pcbrtentry() based on INP_IPV6.

Splitting the IPv6 code into a separate function results in less
#ifdef INET6. Also struct route_in6 *ro in in6_pcbrtentry() is of
the correct type and in_pcbrtentry() does not rely on the fact that
inp_route and inp_route6 are pointers to the same union.

OK kn@ claudio@


# 1.149 28-Jan-2024 bluhm

Use more specific sockaddr type for inpcb notify.

in_pcbnotifyall() is an IPv4 only function. All callers check that
sockaddr dst is in fact a sockaddr_in. Pass the more spcific type
and remove the runtime check at beginning of in_pcbnotifyall().
Use const sockaddr_in in in_pcbnotifyall() and const sockaddr_in6
in6_pcbnotify() as dst parameter.

OK millert@


# 1.148 09-Jan-2024 bluhm

Convert some struct inpcb parameter to const pointer.

OK millert@


# 1.147 03-Jan-2024 bluhm

Run connect(2) in parallel within inet doamin.

This unlocks soconnect() for UDP, rip, rip6 and divert. It takes
shared net lock in combination with per socket lock. TCP and GRE
still use exclusive net lock when connecting.

OK mvs@


# 1.146 01-Jan-2024 bluhm

Protect link between pf and inp with mutex.

Introduce global mutex to protect the pointers between pf state key
and internet PCB. Then in_pcbdisconnect() and in_pcbdetach() do
not need exclusive netlock anymore. Use a bunch of read once
unlocked access to reduce performance impact.

OK sashan@


# 1.145 18-Dec-2023 bluhm

Run bind(2) system call in parallel.

For protocols that care about locking, use the shared net lock to
call sobind(). Use the per socket rwlock together with shared net
lock. This affects protocols UDP, raw IP, and divert. Move the
inpcb mutex locking into soreceive(), it is only used there. Add
a comment to describe the current inmplementation of inpcb locking.

OK mvs@ sashan@


# 1.144 15-Dec-2023 bluhm

Use inpcb table mutex to set addresses.

Protect all remaining write access to inp_faddr and inp_laddr with
inpcb table mutex. Document inpcb locking for foreign and local
address and port and routing table id. Reading will be made MP
safe by adding per socket rw-locks in a next step.

OK sashan@ mvs@


# 1.143 07-Dec-2023 bluhm

Inpcb table mutex protects addr and port during bind(2) and connect(2).

in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() have to set
addresses and ports within the same critical section as the inpcb
hash table calculation. Also lookup and address selection have to
be protected to avoid bindings and connections that are not unique.

For that in_pcbpickport() and in_pcbbind_locked() expect that the
table mutex is already taken. The functions in_pcblookup_lock(),
in_pcblookup_local_lock(), and in_pcbaddrisavail_lock() grab the
mutex iff the lock parameter is IN_PCBLOCK_GRAB. Otherwise the
parameter is IN_PCBLOCK_HOLD has the lock has to be taken already.
Note that in_pcblookup_lock() and in_pcblookup_local() return an
inp with increased reference iff they take and release the lock.
Otherwise the caller protects the life time of the inp.

This gives enough flexibility that in_pcbbind() and in_pcbconnect()
can hold the table mutex when they need it. The public inpcb API
does not change.

OK sashan@ mvs@


# 1.142 03-Dec-2023 bluhm

Use INP_IPV6 flag instead of sotopf().

During initialization in_pcballoc() sets INP_IPV6 once to avoid
reaching through inp_socket->so_proto->pr_domain->dom_family. Use
this flag consistently.

OK sashan@ mvs@


# 1.141 01-Dec-2023 bluhm

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@


# 1.140 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.139 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


# 1.138 26-Nov-2023 bluhm

Remove inp parameter from ip_output().

ip_output() received inp as parameter. This is only used to lookup
the IPsec level of the socket. Reasoning about MP locking is much
easier if only relevant data is passed around. Convert ip_output()
to receive constant inp_seclevel as argument and mark it as protected
by net lock.

OK mvs@


# 1.137 12-Nov-2023 bluhm

Declare global variable zeroin46_addr as const.

OK mvs@ jca@


Revision tags: OPENBSD_7_4_BASE
# 1.136 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_3_BASE
# 1.135 03-Oct-2022 bluhm

System calls should not fail due to temporary memory shortage in
malloc(9) or pool_get(9).
Pass down a wait flag to pru_attach(). During syscall socket(2)
it is ok to wait, this logic was missing for internet pcb. Pfkey
and route sockets were already waiting.
sonewconn() must not wait when called during TCP 3-way handshake.
This logic has been preserved. Unix domain stream socket connect(2)
can wait until the other side has created the socket to accept.
OK mvs@


Revision tags: OPENBSD_7_2_BASE
# 1.134 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.133 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.132 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.131 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.130 21-Aug-2022 bluhm

Introduce a mutex per inpcb to serialize access to socket receive
buffer. Later it may be used to protect more of the PCB or socket.
In divert input replace the kernel lock with this mutex.
OK mvs@


# 1.129 15-May-2022 dlg

have in_pcbselsrc copy the selected address to memory provided by the caller.

having it return a pointer to something that has a lifetime managed
by a lock without accounting for it or taking a reference count or
anything like that is asking for trouble. copying the address to
caller provded memory while still inside the lock is a lot safer.

discussed with visa@
ok bluhm@ claudio@


Revision tags: OPENBSD_7_1_BASE
# 1.128 21-Mar-2022 bluhm

For multicast and broadcast packets udp_input() traverses the loop
of all UDP PCBs. From there it calls udp_sbappend() while holding
the UDP table mutex. This ends in sorwakeup() where we finally
grab the kernel lock while holding a mutex. Witness detects this
misuse.
Use the same solution as for PCB notify. Collect the affected PCBs
in a temporary list. The list is protected by exclusive net lock.
Reported-by: syzbot+7596cb96fb9f3c9d6f4f@syzkaller.appspotmail.com
OK sashan@


# 1.127 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.126 20-Mar-2022 bluhm

Include sys/mutex.h from netinet/in_pcb.h. Struct mutex will be
needed to make inpcb in kernel MP safe. To build sysctl and libkvm
based programs, we have to export it to userland.
OK claudio@


# 1.125 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.124 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.123 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.121 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_8_BASE
# 1.120 21-Jun-2020 dlg

knf: the inp_upcall line was too long.


# 1.119 21-Jun-2020 dlg

add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.

this is so protocols (eg, udp) can let things (eg, kernel support
for wireguard or vxlan or geneve) look at and possibly steal packets
before they get added to a socket buffer.

i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.


Revision tags: OPENBSD_6_7_BASE
# 1.118 13-Nov-2019 deraadt

Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasingly
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.150 31-Jan-2024 bluhm

Split in_pcbrtentry() and in6_pcbrtentry() based on INP_IPV6.

Splitting the IPv6 code into a separate function results in less
#ifdef INET6. Also struct route_in6 *ro in in6_pcbrtentry() is of
the correct type and in_pcbrtentry() does not rely on the fact that
inp_route and inp_route6 are pointers to the same union.

OK kn@ claudio@


# 1.149 28-Jan-2024 bluhm

Use more specific sockaddr type for inpcb notify.

in_pcbnotifyall() is an IPv4 only function. All callers check that
sockaddr dst is in fact a sockaddr_in. Pass the more spcific type
and remove the runtime check at beginning of in_pcbnotifyall().
Use const sockaddr_in in in_pcbnotifyall() and const sockaddr_in6
in6_pcbnotify() as dst parameter.

OK millert@


# 1.148 09-Jan-2024 bluhm

Convert some struct inpcb parameter to const pointer.

OK millert@


# 1.147 03-Jan-2024 bluhm

Run connect(2) in parallel within inet doamin.

This unlocks soconnect() for UDP, rip, rip6 and divert. It takes
shared net lock in combination with per socket lock. TCP and GRE
still use exclusive net lock when connecting.

OK mvs@


# 1.146 01-Jan-2024 bluhm

Protect link between pf and inp with mutex.

Introduce global mutex to protect the pointers between pf state key
and internet PCB. Then in_pcbdisconnect() and in_pcbdetach() do
not need exclusive netlock anymore. Use a bunch of read once
unlocked access to reduce performance impact.

OK sashan@


# 1.145 18-Dec-2023 bluhm

Run bind(2) system call in parallel.

For protocols that care about locking, use the shared net lock to
call sobind(). Use the per socket rwlock together with shared net
lock. This affects protocols UDP, raw IP, and divert. Move the
inpcb mutex locking into soreceive(), it is only used there. Add
a comment to describe the current inmplementation of inpcb locking.

OK mvs@ sashan@


# 1.144 15-Dec-2023 bluhm

Use inpcb table mutex to set addresses.

Protect all remaining write access to inp_faddr and inp_laddr with
inpcb table mutex. Document inpcb locking for foreign and local
address and port and routing table id. Reading will be made MP
safe by adding per socket rw-locks in a next step.

OK sashan@ mvs@


# 1.143 07-Dec-2023 bluhm

Inpcb table mutex protects addr and port during bind(2) and connect(2).

in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() have to set
addresses and ports within the same critical section as the inpcb
hash table calculation. Also lookup and address selection have to
be protected to avoid bindings and connections that are not unique.

For that in_pcbpickport() and in_pcbbind_locked() expect that the
table mutex is already taken. The functions in_pcblookup_lock(),
in_pcblookup_local_lock(), and in_pcbaddrisavail_lock() grab the
mutex iff the lock parameter is IN_PCBLOCK_GRAB. Otherwise the
parameter is IN_PCBLOCK_HOLD has the lock has to be taken already.
Note that in_pcblookup_lock() and in_pcblookup_local() return an
inp with increased reference iff they take and release the lock.
Otherwise the caller protects the life time of the inp.

This gives enough flexibility that in_pcbbind() and in_pcbconnect()
can hold the table mutex when they need it. The public inpcb API
does not change.

OK sashan@ mvs@


# 1.142 03-Dec-2023 bluhm

Use INP_IPV6 flag instead of sotopf().

During initialization in_pcballoc() sets INP_IPV6 once to avoid
reaching through inp_socket->so_proto->pr_domain->dom_family. Use
this flag consistently.

OK sashan@ mvs@


# 1.141 01-Dec-2023 bluhm

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@


# 1.140 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.139 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


# 1.138 26-Nov-2023 bluhm

Remove inp parameter from ip_output().

ip_output() received inp as parameter. This is only used to lookup
the IPsec level of the socket. Reasoning about MP locking is much
easier if only relevant data is passed around. Convert ip_output()
to receive constant inp_seclevel as argument and mark it as protected
by net lock.

OK mvs@


# 1.137 12-Nov-2023 bluhm

Declare global variable zeroin46_addr as const.

OK mvs@ jca@


Revision tags: OPENBSD_7_4_BASE
# 1.136 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_3_BASE
# 1.135 03-Oct-2022 bluhm

System calls should not fail due to temporary memory shortage in
malloc(9) or pool_get(9).
Pass down a wait flag to pru_attach(). During syscall socket(2)
it is ok to wait, this logic was missing for internet pcb. Pfkey
and route sockets were already waiting.
sonewconn() must not wait when called during TCP 3-way handshake.
This logic has been preserved. Unix domain stream socket connect(2)
can wait until the other side has created the socket to accept.
OK mvs@


Revision tags: OPENBSD_7_2_BASE
# 1.134 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.133 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.132 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.131 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.130 21-Aug-2022 bluhm

Introduce a mutex per inpcb to serialize access to socket receive
buffer. Later it may be used to protect more of the PCB or socket.
In divert input replace the kernel lock with this mutex.
OK mvs@


# 1.129 15-May-2022 dlg

have in_pcbselsrc copy the selected address to memory provided by the caller.

having it return a pointer to something that has a lifetime managed
by a lock without accounting for it or taking a reference count or
anything like that is asking for trouble. copying the address to
caller provded memory while still inside the lock is a lot safer.

discussed with visa@
ok bluhm@ claudio@


Revision tags: OPENBSD_7_1_BASE
# 1.128 21-Mar-2022 bluhm

For multicast and broadcast packets udp_input() traverses the loop
of all UDP PCBs. From there it calls udp_sbappend() while holding
the UDP table mutex. This ends in sorwakeup() where we finally
grab the kernel lock while holding a mutex. Witness detects this
misuse.
Use the same solution as for PCB notify. Collect the affected PCBs
in a temporary list. The list is protected by exclusive net lock.
Reported-by: syzbot+7596cb96fb9f3c9d6f4f@syzkaller.appspotmail.com
OK sashan@


# 1.127 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.126 20-Mar-2022 bluhm

Include sys/mutex.h from netinet/in_pcb.h. Struct mutex will be
needed to make inpcb in kernel MP safe. To build sysctl and libkvm
based programs, we have to export it to userland.
OK claudio@


# 1.125 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.124 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.123 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.121 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_8_BASE
# 1.120 21-Jun-2020 dlg

knf: the inp_upcall line was too long.


# 1.119 21-Jun-2020 dlg

add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.

this is so protocols (eg, udp) can let things (eg, kernel support
for wireguard or vxlan or geneve) look at and possibly steal packets
before they get added to a socket buffer.

i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.


Revision tags: OPENBSD_6_7_BASE
# 1.118 13-Nov-2019 deraadt

Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasingly
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.149 28-Jan-2024 bluhm

Use more specific sockaddr type for inpcb notify.

in_pcbnotifyall() is an IPv4 only function. All callers check that
sockaddr dst is in fact a sockaddr_in. Pass the more spcific type
and remove the runtime check at beginning of in_pcbnotifyall().
Use const sockaddr_in in in_pcbnotifyall() and const sockaddr_in6
in6_pcbnotify() as dst parameter.

OK millert@


# 1.148 09-Jan-2024 bluhm

Convert some struct inpcb parameter to const pointer.

OK millert@


# 1.147 03-Jan-2024 bluhm

Run connect(2) in parallel within inet doamin.

This unlocks soconnect() for UDP, rip, rip6 and divert. It takes
shared net lock in combination with per socket lock. TCP and GRE
still use exclusive net lock when connecting.

OK mvs@


# 1.146 01-Jan-2024 bluhm

Protect link between pf and inp with mutex.

Introduce global mutex to protect the pointers between pf state key
and internet PCB. Then in_pcbdisconnect() and in_pcbdetach() do
not need exclusive netlock anymore. Use a bunch of read once
unlocked access to reduce performance impact.

OK sashan@


# 1.145 18-Dec-2023 bluhm

Run bind(2) system call in parallel.

For protocols that care about locking, use the shared net lock to
call sobind(). Use the per socket rwlock together with shared net
lock. This affects protocols UDP, raw IP, and divert. Move the
inpcb mutex locking into soreceive(), it is only used there. Add
a comment to describe the current inmplementation of inpcb locking.

OK mvs@ sashan@


# 1.144 15-Dec-2023 bluhm

Use inpcb table mutex to set addresses.

Protect all remaining write access to inp_faddr and inp_laddr with
inpcb table mutex. Document inpcb locking for foreign and local
address and port and routing table id. Reading will be made MP
safe by adding per socket rw-locks in a next step.

OK sashan@ mvs@


# 1.143 07-Dec-2023 bluhm

Inpcb table mutex protects addr and port during bind(2) and connect(2).

in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() have to set
addresses and ports within the same critical section as the inpcb
hash table calculation. Also lookup and address selection have to
be protected to avoid bindings and connections that are not unique.

For that in_pcbpickport() and in_pcbbind_locked() expect that the
table mutex is already taken. The functions in_pcblookup_lock(),
in_pcblookup_local_lock(), and in_pcbaddrisavail_lock() grab the
mutex iff the lock parameter is IN_PCBLOCK_GRAB. Otherwise the
parameter is IN_PCBLOCK_HOLD has the lock has to be taken already.
Note that in_pcblookup_lock() and in_pcblookup_local() return an
inp with increased reference iff they take and release the lock.
Otherwise the caller protects the life time of the inp.

This gives enough flexibility that in_pcbbind() and in_pcbconnect()
can hold the table mutex when they need it. The public inpcb API
does not change.

OK sashan@ mvs@


# 1.142 03-Dec-2023 bluhm

Use INP_IPV6 flag instead of sotopf().

During initialization in_pcballoc() sets INP_IPV6 once to avoid
reaching through inp_socket->so_proto->pr_domain->dom_family. Use
this flag consistently.

OK sashan@ mvs@


# 1.141 01-Dec-2023 bluhm

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@


# 1.140 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.139 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


# 1.138 26-Nov-2023 bluhm

Remove inp parameter from ip_output().

ip_output() received inp as parameter. This is only used to lookup
the IPsec level of the socket. Reasoning about MP locking is much
easier if only relevant data is passed around. Convert ip_output()
to receive constant inp_seclevel as argument and mark it as protected
by net lock.

OK mvs@


# 1.137 12-Nov-2023 bluhm

Declare global variable zeroin46_addr as const.

OK mvs@ jca@


Revision tags: OPENBSD_7_4_BASE
# 1.136 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_3_BASE
# 1.135 03-Oct-2022 bluhm

System calls should not fail due to temporary memory shortage in
malloc(9) or pool_get(9).
Pass down a wait flag to pru_attach(). During syscall socket(2)
it is ok to wait, this logic was missing for internet pcb. Pfkey
and route sockets were already waiting.
sonewconn() must not wait when called during TCP 3-way handshake.
This logic has been preserved. Unix domain stream socket connect(2)
can wait until the other side has created the socket to accept.
OK mvs@


Revision tags: OPENBSD_7_2_BASE
# 1.134 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.133 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.132 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.131 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.130 21-Aug-2022 bluhm

Introduce a mutex per inpcb to serialize access to socket receive
buffer. Later it may be used to protect more of the PCB or socket.
In divert input replace the kernel lock with this mutex.
OK mvs@


# 1.129 15-May-2022 dlg

have in_pcbselsrc copy the selected address to memory provided by the caller.

having it return a pointer to something that has a lifetime managed
by a lock without accounting for it or taking a reference count or
anything like that is asking for trouble. copying the address to
caller provded memory while still inside the lock is a lot safer.

discussed with visa@
ok bluhm@ claudio@


Revision tags: OPENBSD_7_1_BASE
# 1.128 21-Mar-2022 bluhm

For multicast and broadcast packets udp_input() traverses the loop
of all UDP PCBs. From there it calls udp_sbappend() while holding
the UDP table mutex. This ends in sorwakeup() where we finally
grab the kernel lock while holding a mutex. Witness detects this
misuse.
Use the same solution as for PCB notify. Collect the affected PCBs
in a temporary list. The list is protected by exclusive net lock.
Reported-by: syzbot+7596cb96fb9f3c9d6f4f@syzkaller.appspotmail.com
OK sashan@


# 1.127 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.126 20-Mar-2022 bluhm

Include sys/mutex.h from netinet/in_pcb.h. Struct mutex will be
needed to make inpcb in kernel MP safe. To build sysctl and libkvm
based programs, we have to export it to userland.
OK claudio@


# 1.125 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.124 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.123 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.121 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_8_BASE
# 1.120 21-Jun-2020 dlg

knf: the inp_upcall line was too long.


# 1.119 21-Jun-2020 dlg

add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.

this is so protocols (eg, udp) can let things (eg, kernel support
for wireguard or vxlan or geneve) look at and possibly steal packets
before they get added to a socket buffer.

i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.


Revision tags: OPENBSD_6_7_BASE
# 1.118 13-Nov-2019 deraadt

Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasingly
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.148 09-Jan-2024 bluhm

Convert some struct inpcb parameter to const pointer.

OK millert@


# 1.147 03-Jan-2024 bluhm

Run connect(2) in parallel within inet doamin.

This unlocks soconnect() for UDP, rip, rip6 and divert. It takes
shared net lock in combination with per socket lock. TCP and GRE
still use exclusive net lock when connecting.

OK mvs@


# 1.146 01-Jan-2024 bluhm

Protect link between pf and inp with mutex.

Introduce global mutex to protect the pointers between pf state key
and internet PCB. Then in_pcbdisconnect() and in_pcbdetach() do
not need exclusive netlock anymore. Use a bunch of read once
unlocked access to reduce performance impact.

OK sashan@


# 1.145 18-Dec-2023 bluhm

Run bind(2) system call in parallel.

For protocols that care about locking, use the shared net lock to
call sobind(). Use the per socket rwlock together with shared net
lock. This affects protocols UDP, raw IP, and divert. Move the
inpcb mutex locking into soreceive(), it is only used there. Add
a comment to describe the current inmplementation of inpcb locking.

OK mvs@ sashan@


# 1.144 15-Dec-2023 bluhm

Use inpcb table mutex to set addresses.

Protect all remaining write access to inp_faddr and inp_laddr with
inpcb table mutex. Document inpcb locking for foreign and local
address and port and routing table id. Reading will be made MP
safe by adding per socket rw-locks in a next step.

OK sashan@ mvs@


# 1.143 07-Dec-2023 bluhm

Inpcb table mutex protects addr and port during bind(2) and connect(2).

in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() have to set
addresses and ports within the same critical section as the inpcb
hash table calculation. Also lookup and address selection have to
be protected to avoid bindings and connections that are not unique.

For that in_pcbpickport() and in_pcbbind_locked() expect that the
table mutex is already taken. The functions in_pcblookup_lock(),
in_pcblookup_local_lock(), and in_pcbaddrisavail_lock() grab the
mutex iff the lock parameter is IN_PCBLOCK_GRAB. Otherwise the
parameter is IN_PCBLOCK_HOLD has the lock has to be taken already.
Note that in_pcblookup_lock() and in_pcblookup_local() return an
inp with increased reference iff they take and release the lock.
Otherwise the caller protects the life time of the inp.

This gives enough flexibility that in_pcbbind() and in_pcbconnect()
can hold the table mutex when they need it. The public inpcb API
does not change.

OK sashan@ mvs@


# 1.142 03-Dec-2023 bluhm

Use INP_IPV6 flag instead of sotopf().

During initialization in_pcballoc() sets INP_IPV6 once to avoid
reaching through inp_socket->so_proto->pr_domain->dom_family. Use
this flag consistently.

OK sashan@ mvs@


# 1.141 01-Dec-2023 bluhm

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@


# 1.140 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.139 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


# 1.138 26-Nov-2023 bluhm

Remove inp parameter from ip_output().

ip_output() received inp as parameter. This is only used to lookup
the IPsec level of the socket. Reasoning about MP locking is much
easier if only relevant data is passed around. Convert ip_output()
to receive constant inp_seclevel as argument and mark it as protected
by net lock.

OK mvs@


# 1.137 12-Nov-2023 bluhm

Declare global variable zeroin46_addr as const.

OK mvs@ jca@


Revision tags: OPENBSD_7_4_BASE
# 1.136 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_3_BASE
# 1.135 03-Oct-2022 bluhm

System calls should not fail due to temporary memory shortage in
malloc(9) or pool_get(9).
Pass down a wait flag to pru_attach(). During syscall socket(2)
it is ok to wait, this logic was missing for internet pcb. Pfkey
and route sockets were already waiting.
sonewconn() must not wait when called during TCP 3-way handshake.
This logic has been preserved. Unix domain stream socket connect(2)
can wait until the other side has created the socket to accept.
OK mvs@


Revision tags: OPENBSD_7_2_BASE
# 1.134 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.133 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.132 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.131 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.130 21-Aug-2022 bluhm

Introduce a mutex per inpcb to serialize access to socket receive
buffer. Later it may be used to protect more of the PCB or socket.
In divert input replace the kernel lock with this mutex.
OK mvs@


# 1.129 15-May-2022 dlg

have in_pcbselsrc copy the selected address to memory provided by the caller.

having it return a pointer to something that has a lifetime managed
by a lock without accounting for it or taking a reference count or
anything like that is asking for trouble. copying the address to
caller provded memory while still inside the lock is a lot safer.

discussed with visa@
ok bluhm@ claudio@


Revision tags: OPENBSD_7_1_BASE
# 1.128 21-Mar-2022 bluhm

For multicast and broadcast packets udp_input() traverses the loop
of all UDP PCBs. From there it calls udp_sbappend() while holding
the UDP table mutex. This ends in sorwakeup() where we finally
grab the kernel lock while holding a mutex. Witness detects this
misuse.
Use the same solution as for PCB notify. Collect the affected PCBs
in a temporary list. The list is protected by exclusive net lock.
Reported-by: syzbot+7596cb96fb9f3c9d6f4f@syzkaller.appspotmail.com
OK sashan@


# 1.127 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.126 20-Mar-2022 bluhm

Include sys/mutex.h from netinet/in_pcb.h. Struct mutex will be
needed to make inpcb in kernel MP safe. To build sysctl and libkvm
based programs, we have to export it to userland.
OK claudio@


# 1.125 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.124 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.123 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.121 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_8_BASE
# 1.120 21-Jun-2020 dlg

knf: the inp_upcall line was too long.


# 1.119 21-Jun-2020 dlg

add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.

this is so protocols (eg, udp) can let things (eg, kernel support
for wireguard or vxlan or geneve) look at and possibly steal packets
before they get added to a socket buffer.

i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.


Revision tags: OPENBSD_6_7_BASE
# 1.118 13-Nov-2019 deraadt

Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasingly
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.147 03-Jan-2024 bluhm

Run connect(2) in parallel within inet doamin.

This unlocks soconnect() for UDP, rip, rip6 and divert. It takes
shared net lock in combination with per socket lock. TCP and GRE
still use exclusive net lock when connecting.

OK mvs@


# 1.146 01-Jan-2024 bluhm

Protect link between pf and inp with mutex.

Introduce global mutex to protect the pointers between pf state key
and internet PCB. Then in_pcbdisconnect() and in_pcbdetach() do
not need exclusive netlock anymore. Use a bunch of read once
unlocked access to reduce performance impact.

OK sashan@


# 1.145 18-Dec-2023 bluhm

Run bind(2) system call in parallel.

For protocols that care about locking, use the shared net lock to
call sobind(). Use the per socket rwlock together with shared net
lock. This affects protocols UDP, raw IP, and divert. Move the
inpcb mutex locking into soreceive(), it is only used there. Add
a comment to describe the current inmplementation of inpcb locking.

OK mvs@ sashan@


# 1.144 15-Dec-2023 bluhm

Use inpcb table mutex to set addresses.

Protect all remaining write access to inp_faddr and inp_laddr with
inpcb table mutex. Document inpcb locking for foreign and local
address and port and routing table id. Reading will be made MP
safe by adding per socket rw-locks in a next step.

OK sashan@ mvs@


# 1.143 07-Dec-2023 bluhm

Inpcb table mutex protects addr and port during bind(2) and connect(2).

in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() have to set
addresses and ports within the same critical section as the inpcb
hash table calculation. Also lookup and address selection have to
be protected to avoid bindings and connections that are not unique.

For that in_pcbpickport() and in_pcbbind_locked() expect that the
table mutex is already taken. The functions in_pcblookup_lock(),
in_pcblookup_local_lock(), and in_pcbaddrisavail_lock() grab the
mutex iff the lock parameter is IN_PCBLOCK_GRAB. Otherwise the
parameter is IN_PCBLOCK_HOLD has the lock has to be taken already.
Note that in_pcblookup_lock() and in_pcblookup_local() return an
inp with increased reference iff they take and release the lock.
Otherwise the caller protects the life time of the inp.

This gives enough flexibility that in_pcbbind() and in_pcbconnect()
can hold the table mutex when they need it. The public inpcb API
does not change.

OK sashan@ mvs@


# 1.142 03-Dec-2023 bluhm

Use INP_IPV6 flag instead of sotopf().

During initialization in_pcballoc() sets INP_IPV6 once to avoid
reaching through inp_socket->so_proto->pr_domain->dom_family. Use
this flag consistently.

OK sashan@ mvs@


# 1.141 01-Dec-2023 bluhm

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@


# 1.140 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.139 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


# 1.138 26-Nov-2023 bluhm

Remove inp parameter from ip_output().

ip_output() received inp as parameter. This is only used to lookup
the IPsec level of the socket. Reasoning about MP locking is much
easier if only relevant data is passed around. Convert ip_output()
to receive constant inp_seclevel as argument and mark it as protected
by net lock.

OK mvs@


# 1.137 12-Nov-2023 bluhm

Declare global variable zeroin46_addr as const.

OK mvs@ jca@


Revision tags: OPENBSD_7_4_BASE
# 1.136 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_3_BASE
# 1.135 03-Oct-2022 bluhm

System calls should not fail due to temporary memory shortage in
malloc(9) or pool_get(9).
Pass down a wait flag to pru_attach(). During syscall socket(2)
it is ok to wait, this logic was missing for internet pcb. Pfkey
and route sockets were already waiting.
sonewconn() must not wait when called during TCP 3-way handshake.
This logic has been preserved. Unix domain stream socket connect(2)
can wait until the other side has created the socket to accept.
OK mvs@


Revision tags: OPENBSD_7_2_BASE
# 1.134 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.133 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.132 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.131 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.130 21-Aug-2022 bluhm

Introduce a mutex per inpcb to serialize access to socket receive
buffer. Later it may be used to protect more of the PCB or socket.
In divert input replace the kernel lock with this mutex.
OK mvs@


# 1.129 15-May-2022 dlg

have in_pcbselsrc copy the selected address to memory provided by the caller.

having it return a pointer to something that has a lifetime managed
by a lock without accounting for it or taking a reference count or
anything like that is asking for trouble. copying the address to
caller provded memory while still inside the lock is a lot safer.

discussed with visa@
ok bluhm@ claudio@


Revision tags: OPENBSD_7_1_BASE
# 1.128 21-Mar-2022 bluhm

For multicast and broadcast packets udp_input() traverses the loop
of all UDP PCBs. From there it calls udp_sbappend() while holding
the UDP table mutex. This ends in sorwakeup() where we finally
grab the kernel lock while holding a mutex. Witness detects this
misuse.
Use the same solution as for PCB notify. Collect the affected PCBs
in a temporary list. The list is protected by exclusive net lock.
Reported-by: syzbot+7596cb96fb9f3c9d6f4f@syzkaller.appspotmail.com
OK sashan@


# 1.127 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.126 20-Mar-2022 bluhm

Include sys/mutex.h from netinet/in_pcb.h. Struct mutex will be
needed to make inpcb in kernel MP safe. To build sysctl and libkvm
based programs, we have to export it to userland.
OK claudio@


# 1.125 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.124 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.123 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.121 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_8_BASE
# 1.120 21-Jun-2020 dlg

knf: the inp_upcall line was too long.


# 1.119 21-Jun-2020 dlg

add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.

this is so protocols (eg, udp) can let things (eg, kernel support
for wireguard or vxlan or geneve) look at and possibly steal packets
before they get added to a socket buffer.

i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.


Revision tags: OPENBSD_6_7_BASE
# 1.118 13-Nov-2019 deraadt

Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasingly
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.145 18-Dec-2023 bluhm

Run bind(2) system call in parallel.

For protocols that care about locking, use the shared net lock to
call sobind(). Use the per socket rwlock together with shared net
lock. This affects protocols UDP, raw IP, and divert. Move the
inpcb mutex locking into soreceive(), it is only used there. Add
a comment to describe the current inmplementation of inpcb locking.

OK mvs@ sashan@


# 1.144 15-Dec-2023 bluhm

Use inpcb table mutex to set addresses.

Protect all remaining write access to inp_faddr and inp_laddr with
inpcb table mutex. Document inpcb locking for foreign and local
address and port and routing table id. Reading will be made MP
safe by adding per socket rw-locks in a next step.

OK sashan@ mvs@


# 1.143 07-Dec-2023 bluhm

Inpcb table mutex protects addr and port during bind(2) and connect(2).

in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() have to set
addresses and ports within the same critical section as the inpcb
hash table calculation. Also lookup and address selection have to
be protected to avoid bindings and connections that are not unique.

For that in_pcbpickport() and in_pcbbind_locked() expect that the
table mutex is already taken. The functions in_pcblookup_lock(),
in_pcblookup_local_lock(), and in_pcbaddrisavail_lock() grab the
mutex iff the lock parameter is IN_PCBLOCK_GRAB. Otherwise the
parameter is IN_PCBLOCK_HOLD has the lock has to be taken already.
Note that in_pcblookup_lock() and in_pcblookup_local() return an
inp with increased reference iff they take and release the lock.
Otherwise the caller protects the life time of the inp.

This gives enough flexibility that in_pcbbind() and in_pcbconnect()
can hold the table mutex when they need it. The public inpcb API
does not change.

OK sashan@ mvs@


# 1.142 03-Dec-2023 bluhm

Use INP_IPV6 flag instead of sotopf().

During initialization in_pcballoc() sets INP_IPV6 once to avoid
reaching through inp_socket->so_proto->pr_domain->dom_family. Use
this flag consistently.

OK sashan@ mvs@


# 1.141 01-Dec-2023 bluhm

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@


# 1.140 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.139 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


# 1.138 26-Nov-2023 bluhm

Remove inp parameter from ip_output().

ip_output() received inp as parameter. This is only used to lookup
the IPsec level of the socket. Reasoning about MP locking is much
easier if only relevant data is passed around. Convert ip_output()
to receive constant inp_seclevel as argument and mark it as protected
by net lock.

OK mvs@


# 1.137 12-Nov-2023 bluhm

Declare global variable zeroin46_addr as const.

OK mvs@ jca@


Revision tags: OPENBSD_7_4_BASE
# 1.136 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_3_BASE
# 1.135 03-Oct-2022 bluhm

System calls should not fail due to temporary memory shortage in
malloc(9) or pool_get(9).
Pass down a wait flag to pru_attach(). During syscall socket(2)
it is ok to wait, this logic was missing for internet pcb. Pfkey
and route sockets were already waiting.
sonewconn() must not wait when called during TCP 3-way handshake.
This logic has been preserved. Unix domain stream socket connect(2)
can wait until the other side has created the socket to accept.
OK mvs@


Revision tags: OPENBSD_7_2_BASE
# 1.134 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.133 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.132 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.131 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.130 21-Aug-2022 bluhm

Introduce a mutex per inpcb to serialize access to socket receive
buffer. Later it may be used to protect more of the PCB or socket.
In divert input replace the kernel lock with this mutex.
OK mvs@


# 1.129 15-May-2022 dlg

have in_pcbselsrc copy the selected address to memory provided by the caller.

having it return a pointer to something that has a lifetime managed
by a lock without accounting for it or taking a reference count or
anything like that is asking for trouble. copying the address to
caller provded memory while still inside the lock is a lot safer.

discussed with visa@
ok bluhm@ claudio@


Revision tags: OPENBSD_7_1_BASE
# 1.128 21-Mar-2022 bluhm

For multicast and broadcast packets udp_input() traverses the loop
of all UDP PCBs. From there it calls udp_sbappend() while holding
the UDP table mutex. This ends in sorwakeup() where we finally
grab the kernel lock while holding a mutex. Witness detects this
misuse.
Use the same solution as for PCB notify. Collect the affected PCBs
in a temporary list. The list is protected by exclusive net lock.
Reported-by: syzbot+7596cb96fb9f3c9d6f4f@syzkaller.appspotmail.com
OK sashan@


# 1.127 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.126 20-Mar-2022 bluhm

Include sys/mutex.h from netinet/in_pcb.h. Struct mutex will be
needed to make inpcb in kernel MP safe. To build sysctl and libkvm
based programs, we have to export it to userland.
OK claudio@


# 1.125 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.124 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.123 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.121 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_8_BASE
# 1.120 21-Jun-2020 dlg

knf: the inp_upcall line was too long.


# 1.119 21-Jun-2020 dlg

add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.

this is so protocols (eg, udp) can let things (eg, kernel support
for wireguard or vxlan or geneve) look at and possibly steal packets
before they get added to a socket buffer.

i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.


Revision tags: OPENBSD_6_7_BASE
# 1.118 13-Nov-2019 deraadt

Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasingly
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.144 15-Dec-2023 bluhm

Use inpcb table mutex to set addresses.

Protect all remaining write access to inp_faddr and inp_laddr with
inpcb table mutex. Document inpcb locking for foreign and local
address and port and routing table id. Reading will be made MP
safe by adding per socket rw-locks in a next step.

OK sashan@ mvs@


# 1.143 07-Dec-2023 bluhm

Inpcb table mutex protects addr and port during bind(2) and connect(2).

in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() have to set
addresses and ports within the same critical section as the inpcb
hash table calculation. Also lookup and address selection have to
be protected to avoid bindings and connections that are not unique.

For that in_pcbpickport() and in_pcbbind_locked() expect that the
table mutex is already taken. The functions in_pcblookup_lock(),
in_pcblookup_local_lock(), and in_pcbaddrisavail_lock() grab the
mutex iff the lock parameter is IN_PCBLOCK_GRAB. Otherwise the
parameter is IN_PCBLOCK_HOLD has the lock has to be taken already.
Note that in_pcblookup_lock() and in_pcblookup_local() return an
inp with increased reference iff they take and release the lock.
Otherwise the caller protects the life time of the inp.

This gives enough flexibility that in_pcbbind() and in_pcbconnect()
can hold the table mutex when they need it. The public inpcb API
does not change.

OK sashan@ mvs@


# 1.142 03-Dec-2023 bluhm

Use INP_IPV6 flag instead of sotopf().

During initialization in_pcballoc() sets INP_IPV6 once to avoid
reaching through inp_socket->so_proto->pr_domain->dom_family. Use
this flag consistently.

OK sashan@ mvs@


# 1.141 01-Dec-2023 bluhm

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@


# 1.140 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.139 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


# 1.138 26-Nov-2023 bluhm

Remove inp parameter from ip_output().

ip_output() received inp as parameter. This is only used to lookup
the IPsec level of the socket. Reasoning about MP locking is much
easier if only relevant data is passed around. Convert ip_output()
to receive constant inp_seclevel as argument and mark it as protected
by net lock.

OK mvs@


# 1.137 12-Nov-2023 bluhm

Declare global variable zeroin46_addr as const.

OK mvs@ jca@


Revision tags: OPENBSD_7_4_BASE
# 1.136 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_3_BASE
# 1.135 03-Oct-2022 bluhm

System calls should not fail due to temporary memory shortage in
malloc(9) or pool_get(9).
Pass down a wait flag to pru_attach(). During syscall socket(2)
it is ok to wait, this logic was missing for internet pcb. Pfkey
and route sockets were already waiting.
sonewconn() must not wait when called during TCP 3-way handshake.
This logic has been preserved. Unix domain stream socket connect(2)
can wait until the other side has created the socket to accept.
OK mvs@


Revision tags: OPENBSD_7_2_BASE
# 1.134 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.133 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.132 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.131 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.130 21-Aug-2022 bluhm

Introduce a mutex per inpcb to serialize access to socket receive
buffer. Later it may be used to protect more of the PCB or socket.
In divert input replace the kernel lock with this mutex.
OK mvs@


# 1.129 15-May-2022 dlg

have in_pcbselsrc copy the selected address to memory provided by the caller.

having it return a pointer to something that has a lifetime managed
by a lock without accounting for it or taking a reference count or
anything like that is asking for trouble. copying the address to
caller provded memory while still inside the lock is a lot safer.

discussed with visa@
ok bluhm@ claudio@


Revision tags: OPENBSD_7_1_BASE
# 1.128 21-Mar-2022 bluhm

For multicast and broadcast packets udp_input() traverses the loop
of all UDP PCBs. From there it calls udp_sbappend() while holding
the UDP table mutex. This ends in sorwakeup() where we finally
grab the kernel lock while holding a mutex. Witness detects this
misuse.
Use the same solution as for PCB notify. Collect the affected PCBs
in a temporary list. The list is protected by exclusive net lock.
Reported-by: syzbot+7596cb96fb9f3c9d6f4f@syzkaller.appspotmail.com
OK sashan@


# 1.127 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.126 20-Mar-2022 bluhm

Include sys/mutex.h from netinet/in_pcb.h. Struct mutex will be
needed to make inpcb in kernel MP safe. To build sysctl and libkvm
based programs, we have to export it to userland.
OK claudio@


# 1.125 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.124 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.123 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.121 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_8_BASE
# 1.120 21-Jun-2020 dlg

knf: the inp_upcall line was too long.


# 1.119 21-Jun-2020 dlg

add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.

this is so protocols (eg, udp) can let things (eg, kernel support
for wireguard or vxlan or geneve) look at and possibly steal packets
before they get added to a socket buffer.

i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.


Revision tags: OPENBSD_6_7_BASE
# 1.118 13-Nov-2019 deraadt

Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasingly
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.143 07-Dec-2023 bluhm

Inpcb table mutex protects addr and port during bind(2) and connect(2).

in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() have to set
addresses and ports within the same critical section as the inpcb
hash table calculation. Also lookup and address selection have to
be protected to avoid bindings and connections that are not unique.

For that in_pcbpickport() and in_pcbbind_locked() expect that the
table mutex is already taken. The functions in_pcblookup_lock(),
in_pcblookup_local_lock(), and in_pcbaddrisavail_lock() grab the
mutex iff the lock parameter is IN_PCBLOCK_GRAB. Otherwise the
parameter is IN_PCBLOCK_HOLD has the lock has to be taken already.
Note that in_pcblookup_lock() and in_pcblookup_local() return an
inp with increased reference iff they take and release the lock.
Otherwise the caller protects the life time of the inp.

This gives enough flexibility that in_pcbbind() and in_pcbconnect()
can hold the table mutex when they need it. The public inpcb API
does not change.

OK sashan@ mvs@


# 1.142 03-Dec-2023 bluhm

Use INP_IPV6 flag instead of sotopf().

During initialization in_pcballoc() sets INP_IPV6 once to avoid
reaching through inp_socket->so_proto->pr_domain->dom_family. Use
this flag consistently.

OK sashan@ mvs@


# 1.141 01-Dec-2023 bluhm

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@


# 1.140 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.139 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


# 1.138 26-Nov-2023 bluhm

Remove inp parameter from ip_output().

ip_output() received inp as parameter. This is only used to lookup
the IPsec level of the socket. Reasoning about MP locking is much
easier if only relevant data is passed around. Convert ip_output()
to receive constant inp_seclevel as argument and mark it as protected
by net lock.

OK mvs@


# 1.137 12-Nov-2023 bluhm

Declare global variable zeroin46_addr as const.

OK mvs@ jca@


Revision tags: OPENBSD_7_4_BASE
# 1.136 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_3_BASE
# 1.135 03-Oct-2022 bluhm

System calls should not fail due to temporary memory shortage in
malloc(9) or pool_get(9).
Pass down a wait flag to pru_attach(). During syscall socket(2)
it is ok to wait, this logic was missing for internet pcb. Pfkey
and route sockets were already waiting.
sonewconn() must not wait when called during TCP 3-way handshake.
This logic has been preserved. Unix domain stream socket connect(2)
can wait until the other side has created the socket to accept.
OK mvs@


Revision tags: OPENBSD_7_2_BASE
# 1.134 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.133 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.132 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.131 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.130 21-Aug-2022 bluhm

Introduce a mutex per inpcb to serialize access to socket receive
buffer. Later it may be used to protect more of the PCB or socket.
In divert input replace the kernel lock with this mutex.
OK mvs@


# 1.129 15-May-2022 dlg

have in_pcbselsrc copy the selected address to memory provided by the caller.

having it return a pointer to something that has a lifetime managed
by a lock without accounting for it or taking a reference count or
anything like that is asking for trouble. copying the address to
caller provded memory while still inside the lock is a lot safer.

discussed with visa@
ok bluhm@ claudio@


Revision tags: OPENBSD_7_1_BASE
# 1.128 21-Mar-2022 bluhm

For multicast and broadcast packets udp_input() traverses the loop
of all UDP PCBs. From there it calls udp_sbappend() while holding
the UDP table mutex. This ends in sorwakeup() where we finally
grab the kernel lock while holding a mutex. Witness detects this
misuse.
Use the same solution as for PCB notify. Collect the affected PCBs
in a temporary list. The list is protected by exclusive net lock.
Reported-by: syzbot+7596cb96fb9f3c9d6f4f@syzkaller.appspotmail.com
OK sashan@


# 1.127 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.126 20-Mar-2022 bluhm

Include sys/mutex.h from netinet/in_pcb.h. Struct mutex will be
needed to make inpcb in kernel MP safe. To build sysctl and libkvm
based programs, we have to export it to userland.
OK claudio@


# 1.125 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.124 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.123 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.121 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_8_BASE
# 1.120 21-Jun-2020 dlg

knf: the inp_upcall line was too long.


# 1.119 21-Jun-2020 dlg

add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.

this is so protocols (eg, udp) can let things (eg, kernel support
for wireguard or vxlan or geneve) look at and possibly steal packets
before they get added to a socket buffer.

i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.


Revision tags: OPENBSD_6_7_BASE
# 1.118 13-Nov-2019 deraadt

Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasingly
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.142 03-Dec-2023 bluhm

Use INP_IPV6 flag instead of sotopf().

During initialization in_pcballoc() sets INP_IPV6 once to avoid
reaching through inp_socket->so_proto->pr_domain->dom_family. Use
this flag consistently.

OK sashan@ mvs@


# 1.141 01-Dec-2023 bluhm

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@


# 1.140 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.139 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


# 1.138 26-Nov-2023 bluhm

Remove inp parameter from ip_output().

ip_output() received inp as parameter. This is only used to lookup
the IPsec level of the socket. Reasoning about MP locking is much
easier if only relevant data is passed around. Convert ip_output()
to receive constant inp_seclevel as argument and mark it as protected
by net lock.

OK mvs@


# 1.137 12-Nov-2023 bluhm

Declare global variable zeroin46_addr as const.

OK mvs@ jca@


Revision tags: OPENBSD_7_4_BASE
# 1.136 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_3_BASE
# 1.135 03-Oct-2022 bluhm

System calls should not fail due to temporary memory shortage in
malloc(9) or pool_get(9).
Pass down a wait flag to pru_attach(). During syscall socket(2)
it is ok to wait, this logic was missing for internet pcb. Pfkey
and route sockets were already waiting.
sonewconn() must not wait when called during TCP 3-way handshake.
This logic has been preserved. Unix domain stream socket connect(2)
can wait until the other side has created the socket to accept.
OK mvs@


Revision tags: OPENBSD_7_2_BASE
# 1.134 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.133 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.132 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.131 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.130 21-Aug-2022 bluhm

Introduce a mutex per inpcb to serialize access to socket receive
buffer. Later it may be used to protect more of the PCB or socket.
In divert input replace the kernel lock with this mutex.
OK mvs@


# 1.129 15-May-2022 dlg

have in_pcbselsrc copy the selected address to memory provided by the caller.

having it return a pointer to something that has a lifetime managed
by a lock without accounting for it or taking a reference count or
anything like that is asking for trouble. copying the address to
caller provded memory while still inside the lock is a lot safer.

discussed with visa@
ok bluhm@ claudio@


Revision tags: OPENBSD_7_1_BASE
# 1.128 21-Mar-2022 bluhm

For multicast and broadcast packets udp_input() traverses the loop
of all UDP PCBs. From there it calls udp_sbappend() while holding
the UDP table mutex. This ends in sorwakeup() where we finally
grab the kernel lock while holding a mutex. Witness detects this
misuse.
Use the same solution as for PCB notify. Collect the affected PCBs
in a temporary list. The list is protected by exclusive net lock.
Reported-by: syzbot+7596cb96fb9f3c9d6f4f@syzkaller.appspotmail.com
OK sashan@


# 1.127 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.126 20-Mar-2022 bluhm

Include sys/mutex.h from netinet/in_pcb.h. Struct mutex will be
needed to make inpcb in kernel MP safe. To build sysctl and libkvm
based programs, we have to export it to userland.
OK claudio@


# 1.125 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.124 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.123 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.121 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_8_BASE
# 1.120 21-Jun-2020 dlg

knf: the inp_upcall line was too long.


# 1.119 21-Jun-2020 dlg

add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.

this is so protocols (eg, udp) can let things (eg, kernel support
for wireguard or vxlan or geneve) look at and possibly steal packets
before they get added to a socket buffer.

i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.


Revision tags: OPENBSD_6_7_BASE
# 1.118 13-Nov-2019 deraadt

Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasingly
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.142 03-Dec-2023 bluhm

Use INP_IPV6 flag instead of sotopf().

During initialization in_pcballoc() sets INP_IPV6 once to avoid
reaching through inp_socket->so_proto->pr_domain->dom_family. Use
this flag consistently.

OK sashan@ mvs@


# 1.141 01-Dec-2023 bluhm

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@


# 1.140 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.139 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


# 1.138 26-Nov-2023 bluhm

Remove inp parameter from ip_output().

ip_output() received inp as parameter. This is only used to lookup
the IPsec level of the socket. Reasoning about MP locking is much
easier if only relevant data is passed around. Convert ip_output()
to receive constant inp_seclevel as argument and mark it as protected
by net lock.

OK mvs@


# 1.137 12-Nov-2023 bluhm

Declare global variable zeroin46_addr as const.

OK mvs@ jca@


Revision tags: OPENBSD_7_4_BASE
# 1.136 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_3_BASE
# 1.135 03-Oct-2022 bluhm

System calls should not fail due to temporary memory shortage in
malloc(9) or pool_get(9).
Pass down a wait flag to pru_attach(). During syscall socket(2)
it is ok to wait, this logic was missing for internet pcb. Pfkey
and route sockets were already waiting.
sonewconn() must not wait when called during TCP 3-way handshake.
This logic has been preserved. Unix domain stream socket connect(2)
can wait until the other side has created the socket to accept.
OK mvs@


Revision tags: OPENBSD_7_2_BASE
# 1.134 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.133 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.132 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.131 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.130 21-Aug-2022 bluhm

Introduce a mutex per inpcb to serialize access to socket receive
buffer. Later it may be used to protect more of the PCB or socket.
In divert input replace the kernel lock with this mutex.
OK mvs@


# 1.129 15-May-2022 dlg

have in_pcbselsrc copy the selected address to memory provided by the caller.

having it return a pointer to something that has a lifetime managed
by a lock without accounting for it or taking a reference count or
anything like that is asking for trouble. copying the address to
caller provded memory while still inside the lock is a lot safer.

discussed with visa@
ok bluhm@ claudio@


Revision tags: OPENBSD_7_1_BASE
# 1.128 21-Mar-2022 bluhm

For multicast and broadcast packets udp_input() traverses the loop
of all UDP PCBs. From there it calls udp_sbappend() while holding
the UDP table mutex. This ends in sorwakeup() where we finally
grab the kernel lock while holding a mutex. Witness detects this
misuse.
Use the same solution as for PCB notify. Collect the affected PCBs
in a temporary list. The list is protected by exclusive net lock.
Reported-by: syzbot+7596cb96fb9f3c9d6f4f@syzkaller.appspotmail.com
OK sashan@


# 1.127 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.126 20-Mar-2022 bluhm

Include sys/mutex.h from netinet/in_pcb.h. Struct mutex will be
needed to make inpcb in kernel MP safe. To build sysctl and libkvm
based programs, we have to export it to userland.
OK claudio@


# 1.125 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.124 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.123 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.121 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_8_BASE
# 1.120 21-Jun-2020 dlg

knf: the inp_upcall line was too long.


# 1.119 21-Jun-2020 dlg

add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.

this is so protocols (eg, udp) can let things (eg, kernel support
for wireguard or vxlan or geneve) look at and possibly steal packets
before they get added to a socket buffer.

i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.


Revision tags: OPENBSD_6_7_BASE
# 1.118 13-Nov-2019 deraadt

Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasingly
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.141 01-Dec-2023 bluhm

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@


# 1.140 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.139 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


# 1.138 26-Nov-2023 bluhm

Remove inp parameter from ip_output().

ip_output() received inp as parameter. This is only used to lookup
the IPsec level of the socket. Reasoning about MP locking is much
easier if only relevant data is passed around. Convert ip_output()
to receive constant inp_seclevel as argument and mark it as protected
by net lock.

OK mvs@


# 1.137 12-Nov-2023 bluhm

Declare global variable zeroin46_addr as const.

OK mvs@ jca@


Revision tags: OPENBSD_7_4_BASE
# 1.136 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_3_BASE
# 1.135 03-Oct-2022 bluhm

System calls should not fail due to temporary memory shortage in
malloc(9) or pool_get(9).
Pass down a wait flag to pru_attach(). During syscall socket(2)
it is ok to wait, this logic was missing for internet pcb. Pfkey
and route sockets were already waiting.
sonewconn() must not wait when called during TCP 3-way handshake.
This logic has been preserved. Unix domain stream socket connect(2)
can wait until the other side has created the socket to accept.
OK mvs@


Revision tags: OPENBSD_7_2_BASE
# 1.134 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.133 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.132 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.131 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.130 21-Aug-2022 bluhm

Introduce a mutex per inpcb to serialize access to socket receive
buffer. Later it may be used to protect more of the PCB or socket.
In divert input replace the kernel lock with this mutex.
OK mvs@


# 1.129 15-May-2022 dlg

have in_pcbselsrc copy the selected address to memory provided by the caller.

having it return a pointer to something that has a lifetime managed
by a lock without accounting for it or taking a reference count or
anything like that is asking for trouble. copying the address to
caller provded memory while still inside the lock is a lot safer.

discussed with visa@
ok bluhm@ claudio@


Revision tags: OPENBSD_7_1_BASE
# 1.128 21-Mar-2022 bluhm

For multicast and broadcast packets udp_input() traverses the loop
of all UDP PCBs. From there it calls udp_sbappend() while holding
the UDP table mutex. This ends in sorwakeup() where we finally
grab the kernel lock while holding a mutex. Witness detects this
misuse.
Use the same solution as for PCB notify. Collect the affected PCBs
in a temporary list. The list is protected by exclusive net lock.
Reported-by: syzbot+7596cb96fb9f3c9d6f4f@syzkaller.appspotmail.com
OK sashan@


# 1.127 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.126 20-Mar-2022 bluhm

Include sys/mutex.h from netinet/in_pcb.h. Struct mutex will be
needed to make inpcb in kernel MP safe. To build sysctl and libkvm
based programs, we have to export it to userland.
OK claudio@


# 1.125 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.124 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.123 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.121 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_8_BASE
# 1.120 21-Jun-2020 dlg

knf: the inp_upcall line was too long.


# 1.119 21-Jun-2020 dlg

add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.

this is so protocols (eg, udp) can let things (eg, kernel support
for wireguard or vxlan or geneve) look at and possibly steal packets
before they get added to a socket buffer.

i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.


Revision tags: OPENBSD_6_7_BASE
# 1.118 13-Nov-2019 deraadt

Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasingly
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.140 29-Nov-2023 bluhm

Document inp_socket as immutable and remove NULL checks.

Struct inpcb field inp_socket is initialized in in_pcballoc(). It
is not NULL and never changed.

OK mvs@


# 1.139 28-Nov-2023 bluhm

Remove struct inpcb from in6_embedscope() parameters.

rip6_output() did modify inp_outputopts6 temporarily to provide
different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6
and inp_moptions6 as separate arguments to in6_embedscope().
Simplify the code that deals with these options in in6_embedscope().
Doucument inp_moptions and inp_moptions6 as protected by net lock.

OK kn@


# 1.138 26-Nov-2023 bluhm

Remove inp parameter from ip_output().

ip_output() received inp as parameter. This is only used to lookup
the IPsec level of the socket. Reasoning about MP locking is much
easier if only relevant data is passed around. Convert ip_output()
to receive constant inp_seclevel as argument and mark it as protected
by net lock.

OK mvs@


# 1.137 12-Nov-2023 bluhm

Declare global variable zeroin46_addr as const.

OK mvs@ jca@


Revision tags: OPENBSD_7_4_BASE
# 1.136 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_3_BASE
# 1.135 03-Oct-2022 bluhm

System calls should not fail due to temporary memory shortage in
malloc(9) or pool_get(9).
Pass down a wait flag to pru_attach(). During syscall socket(2)
it is ok to wait, this logic was missing for internet pcb. Pfkey
and route sockets were already waiting.
sonewconn() must not wait when called during TCP 3-way handshake.
This logic has been preserved. Unix domain stream socket connect(2)
can wait until the other side has created the socket to accept.
OK mvs@


Revision tags: OPENBSD_7_2_BASE
# 1.134 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.133 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.132 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.131 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.130 21-Aug-2022 bluhm

Introduce a mutex per inpcb to serialize access to socket receive
buffer. Later it may be used to protect more of the PCB or socket.
In divert input replace the kernel lock with this mutex.
OK mvs@


# 1.129 15-May-2022 dlg

have in_pcbselsrc copy the selected address to memory provided by the caller.

having it return a pointer to something that has a lifetime managed
by a lock without accounting for it or taking a reference count or
anything like that is asking for trouble. copying the address to
caller provded memory while still inside the lock is a lot safer.

discussed with visa@
ok bluhm@ claudio@


Revision tags: OPENBSD_7_1_BASE
# 1.128 21-Mar-2022 bluhm

For multicast and broadcast packets udp_input() traverses the loop
of all UDP PCBs. From there it calls udp_sbappend() while holding
the UDP table mutex. This ends in sorwakeup() where we finally
grab the kernel lock while holding a mutex. Witness detects this
misuse.
Use the same solution as for PCB notify. Collect the affected PCBs
in a temporary list. The list is protected by exclusive net lock.
Reported-by: syzbot+7596cb96fb9f3c9d6f4f@syzkaller.appspotmail.com
OK sashan@


# 1.127 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.126 20-Mar-2022 bluhm

Include sys/mutex.h from netinet/in_pcb.h. Struct mutex will be
needed to make inpcb in kernel MP safe. To build sysctl and libkvm
based programs, we have to export it to userland.
OK claudio@


# 1.125 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.124 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.123 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.121 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_8_BASE
# 1.120 21-Jun-2020 dlg

knf: the inp_upcall line was too long.


# 1.119 21-Jun-2020 dlg

add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.

this is so protocols (eg, udp) can let things (eg, kernel support
for wireguard or vxlan or geneve) look at and possibly steal packets
before they get added to a socket buffer.

i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.


Revision tags: OPENBSD_6_7_BASE
# 1.118 13-Nov-2019 deraadt

Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasingly
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.138 26-Nov-2023 bluhm

Remove inp parameter from ip_output().

ip_output() received inp as parameter. This is only used to lookup
the IPsec level of the socket. Reasoning about MP locking is much
easier if only relevant data is passed around. Convert ip_output()
to receive constant inp_seclevel as argument and mark it as protected
by net lock.

OK mvs@


# 1.137 12-Nov-2023 bluhm

Declare global variable zeroin46_addr as const.

OK mvs@ jca@


Revision tags: OPENBSD_7_4_BASE
# 1.136 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_3_BASE
# 1.135 03-Oct-2022 bluhm

System calls should not fail due to temporary memory shortage in
malloc(9) or pool_get(9).
Pass down a wait flag to pru_attach(). During syscall socket(2)
it is ok to wait, this logic was missing for internet pcb. Pfkey
and route sockets were already waiting.
sonewconn() must not wait when called during TCP 3-way handshake.
This logic has been preserved. Unix domain stream socket connect(2)
can wait until the other side has created the socket to accept.
OK mvs@


Revision tags: OPENBSD_7_2_BASE
# 1.134 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.133 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.132 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.131 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.130 21-Aug-2022 bluhm

Introduce a mutex per inpcb to serialize access to socket receive
buffer. Later it may be used to protect more of the PCB or socket.
In divert input replace the kernel lock with this mutex.
OK mvs@


# 1.129 15-May-2022 dlg

have in_pcbselsrc copy the selected address to memory provided by the caller.

having it return a pointer to something that has a lifetime managed
by a lock without accounting for it or taking a reference count or
anything like that is asking for trouble. copying the address to
caller provded memory while still inside the lock is a lot safer.

discussed with visa@
ok bluhm@ claudio@


Revision tags: OPENBSD_7_1_BASE
# 1.128 21-Mar-2022 bluhm

For multicast and broadcast packets udp_input() traverses the loop
of all UDP PCBs. From there it calls udp_sbappend() while holding
the UDP table mutex. This ends in sorwakeup() where we finally
grab the kernel lock while holding a mutex. Witness detects this
misuse.
Use the same solution as for PCB notify. Collect the affected PCBs
in a temporary list. The list is protected by exclusive net lock.
Reported-by: syzbot+7596cb96fb9f3c9d6f4f@syzkaller.appspotmail.com
OK sashan@


# 1.127 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.126 20-Mar-2022 bluhm

Include sys/mutex.h from netinet/in_pcb.h. Struct mutex will be
needed to make inpcb in kernel MP safe. To build sysctl and libkvm
based programs, we have to export it to userland.
OK claudio@


# 1.125 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.124 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.123 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.121 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_8_BASE
# 1.120 21-Jun-2020 dlg

knf: the inp_upcall line was too long.


# 1.119 21-Jun-2020 dlg

add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.

this is so protocols (eg, udp) can let things (eg, kernel support
for wireguard or vxlan or geneve) look at and possibly steal packets
before they get added to a socket buffer.

i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.


Revision tags: OPENBSD_6_7_BASE
# 1.118 13-Nov-2019 deraadt

Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasingly
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.137 12-Nov-2023 bluhm

Declare global variable zeroin46_addr as const.

OK mvs@ jca@


Revision tags: OPENBSD_7_4_BASE
# 1.136 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_3_BASE
# 1.135 03-Oct-2022 bluhm

System calls should not fail due to temporary memory shortage in
malloc(9) or pool_get(9).
Pass down a wait flag to pru_attach(). During syscall socket(2)
it is ok to wait, this logic was missing for internet pcb. Pfkey
and route sockets were already waiting.
sonewconn() must not wait when called during TCP 3-way handshake.
This logic has been preserved. Unix domain stream socket connect(2)
can wait until the other side has created the socket to accept.
OK mvs@


Revision tags: OPENBSD_7_2_BASE
# 1.134 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.133 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.132 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.131 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.130 21-Aug-2022 bluhm

Introduce a mutex per inpcb to serialize access to socket receive
buffer. Later it may be used to protect more of the PCB or socket.
In divert input replace the kernel lock with this mutex.
OK mvs@


# 1.129 15-May-2022 dlg

have in_pcbselsrc copy the selected address to memory provided by the caller.

having it return a pointer to something that has a lifetime managed
by a lock without accounting for it or taking a reference count or
anything like that is asking for trouble. copying the address to
caller provded memory while still inside the lock is a lot safer.

discussed with visa@
ok bluhm@ claudio@


Revision tags: OPENBSD_7_1_BASE
# 1.128 21-Mar-2022 bluhm

For multicast and broadcast packets udp_input() traverses the loop
of all UDP PCBs. From there it calls udp_sbappend() while holding
the UDP table mutex. This ends in sorwakeup() where we finally
grab the kernel lock while holding a mutex. Witness detects this
misuse.
Use the same solution as for PCB notify. Collect the affected PCBs
in a temporary list. The list is protected by exclusive net lock.
Reported-by: syzbot+7596cb96fb9f3c9d6f4f@syzkaller.appspotmail.com
OK sashan@


# 1.127 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.126 20-Mar-2022 bluhm

Include sys/mutex.h from netinet/in_pcb.h. Struct mutex will be
needed to make inpcb in kernel MP safe. To build sysctl and libkvm
based programs, we have to export it to userland.
OK claudio@


# 1.125 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.124 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.123 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.121 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_8_BASE
# 1.120 21-Jun-2020 dlg

knf: the inp_upcall line was too long.


# 1.119 21-Jun-2020 dlg

add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.

this is so protocols (eg, udp) can let things (eg, kernel support
for wireguard or vxlan or geneve) look at and possibly steal packets
before they get added to a socket buffer.

i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.


Revision tags: OPENBSD_6_7_BASE
# 1.118 13-Nov-2019 deraadt

Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasingly
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.136 24-Jun-2023 bluhm

Calculate inet PCB SIP hash without table mutex.

Goal is to run UDP input in parallel. Btrace kstack analysis shows
that SIP hash for PCB lookup is quite expensive. When running in
parallel, there is also lock contention on the PCB table mutex.

It results in better performance to calculate the hash value before
taking the mutex. The hash secret has to be constant as hash
calculation must not depend on values protected by the table mutex.
Do not reseed anymore when hash table gets resized.

Analysis also shows that asserting a rw_lock while holding a mutex
is a bit expensive. Just remove the netlock assert.

OK dlg@ mvs@


Revision tags: OPENBSD_7_3_BASE
# 1.135 03-Oct-2022 bluhm

System calls should not fail due to temporary memory shortage in
malloc(9) or pool_get(9).
Pass down a wait flag to pru_attach(). During syscall socket(2)
it is ok to wait, this logic was missing for internet pcb. Pfkey
and route sockets were already waiting.
sonewconn() must not wait when called during TCP 3-way handshake.
This logic has been preserved. Unix domain stream socket connect(2)
can wait until the other side has created the socket to accept.
OK mvs@


Revision tags: OPENBSD_7_2_BASE
# 1.134 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.133 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.132 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.131 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.130 21-Aug-2022 bluhm

Introduce a mutex per inpcb to serialize access to socket receive
buffer. Later it may be used to protect more of the PCB or socket.
In divert input replace the kernel lock with this mutex.
OK mvs@


# 1.129 15-May-2022 dlg

have in_pcbselsrc copy the selected address to memory provided by the caller.

having it return a pointer to something that has a lifetime managed
by a lock without accounting for it or taking a reference count or
anything like that is asking for trouble. copying the address to
caller provded memory while still inside the lock is a lot safer.

discussed with visa@
ok bluhm@ claudio@


Revision tags: OPENBSD_7_1_BASE
# 1.128 21-Mar-2022 bluhm

For multicast and broadcast packets udp_input() traverses the loop
of all UDP PCBs. From there it calls udp_sbappend() while holding
the UDP table mutex. This ends in sorwakeup() where we finally
grab the kernel lock while holding a mutex. Witness detects this
misuse.
Use the same solution as for PCB notify. Collect the affected PCBs
in a temporary list. The list is protected by exclusive net lock.
Reported-by: syzbot+7596cb96fb9f3c9d6f4f@syzkaller.appspotmail.com
OK sashan@


# 1.127 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.126 20-Mar-2022 bluhm

Include sys/mutex.h from netinet/in_pcb.h. Struct mutex will be
needed to make inpcb in kernel MP safe. To build sysctl and libkvm
based programs, we have to export it to userland.
OK claudio@


# 1.125 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.124 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.123 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.121 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_8_BASE
# 1.120 21-Jun-2020 dlg

knf: the inp_upcall line was too long.


# 1.119 21-Jun-2020 dlg

add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.

this is so protocols (eg, udp) can let things (eg, kernel support
for wireguard or vxlan or geneve) look at and possibly steal packets
before they get added to a socket buffer.

i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.


Revision tags: OPENBSD_6_7_BASE
# 1.118 13-Nov-2019 deraadt

Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasingly
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.135 03-Oct-2022 bluhm

System calls should not fail due to temporary memory shortage in
malloc(9) or pool_get(9).
Pass down a wait flag to pru_attach(). During syscall socket(2)
it is ok to wait, this logic was missing for internet pcb. Pfkey
and route sockets were already waiting.
sonewconn() must not wait when called during TCP 3-way handshake.
This logic has been preserved. Unix domain stream socket connect(2)
can wait until the other side has created the socket to accept.
OK mvs@


Revision tags: OPENBSD_7_2_BASE
# 1.134 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.133 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.132 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.131 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.130 21-Aug-2022 bluhm

Introduce a mutex per inpcb to serialize access to socket receive
buffer. Later it may be used to protect more of the PCB or socket.
In divert input replace the kernel lock with this mutex.
OK mvs@


# 1.129 15-May-2022 dlg

have in_pcbselsrc copy the selected address to memory provided by the caller.

having it return a pointer to something that has a lifetime managed
by a lock without accounting for it or taking a reference count or
anything like that is asking for trouble. copying the address to
caller provded memory while still inside the lock is a lot safer.

discussed with visa@
ok bluhm@ claudio@


Revision tags: OPENBSD_7_1_BASE
# 1.128 21-Mar-2022 bluhm

For multicast and broadcast packets udp_input() traverses the loop
of all UDP PCBs. From there it calls udp_sbappend() while holding
the UDP table mutex. This ends in sorwakeup() where we finally
grab the kernel lock while holding a mutex. Witness detects this
misuse.
Use the same solution as for PCB notify. Collect the affected PCBs
in a temporary list. The list is protected by exclusive net lock.
Reported-by: syzbot+7596cb96fb9f3c9d6f4f@syzkaller.appspotmail.com
OK sashan@


# 1.127 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.126 20-Mar-2022 bluhm

Include sys/mutex.h from netinet/in_pcb.h. Struct mutex will be
needed to make inpcb in kernel MP safe. To build sysctl and libkvm
based programs, we have to export it to userland.
OK claudio@


# 1.125 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.124 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.123 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.121 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_8_BASE
# 1.120 21-Jun-2020 dlg

knf: the inp_upcall line was too long.


# 1.119 21-Jun-2020 dlg

add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.

this is so protocols (eg, udp) can let things (eg, kernel support
for wireguard or vxlan or geneve) look at and possibly steal packets
before they get added to a socket buffer.

i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.


Revision tags: OPENBSD_6_7_BASE
# 1.118 13-Nov-2019 deraadt

Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasingly
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.134 03-Sep-2022 mvs

Move PRU_PEERADDR request to (*pru_peeraddr)().

Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.

Also remove *_usrreq() handlers.

ok bluhm@


# 1.133 03-Sep-2022 mvs

Move PRU_SOCKADDR request to (*pru_sockaddr)()

Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.

The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.

ok bluhm@


# 1.132 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.131 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.130 21-Aug-2022 bluhm

Introduce a mutex per inpcb to serialize access to socket receive
buffer. Later it may be used to protect more of the PCB or socket.
In divert input replace the kernel lock with this mutex.
OK mvs@


# 1.129 15-May-2022 dlg

have in_pcbselsrc copy the selected address to memory provided by the caller.

having it return a pointer to something that has a lifetime managed
by a lock without accounting for it or taking a reference count or
anything like that is asking for trouble. copying the address to
caller provded memory while still inside the lock is a lot safer.

discussed with visa@
ok bluhm@ claudio@


Revision tags: OPENBSD_7_1_BASE
# 1.128 21-Mar-2022 bluhm

For multicast and broadcast packets udp_input() traverses the loop
of all UDP PCBs. From there it calls udp_sbappend() while holding
the UDP table mutex. This ends in sorwakeup() where we finally
grab the kernel lock while holding a mutex. Witness detects this
misuse.
Use the same solution as for PCB notify. Collect the affected PCBs
in a temporary list. The list is protected by exclusive net lock.
Reported-by: syzbot+7596cb96fb9f3c9d6f4f@syzkaller.appspotmail.com
OK sashan@


# 1.127 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.126 20-Mar-2022 bluhm

Include sys/mutex.h from netinet/in_pcb.h. Struct mutex will be
needed to make inpcb in kernel MP safe. To build sysctl and libkvm
based programs, we have to export it to userland.
OK claudio@


# 1.125 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.124 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.123 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.121 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_8_BASE
# 1.120 21-Jun-2020 dlg

knf: the inp_upcall line was too long.


# 1.119 21-Jun-2020 dlg

add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.

this is so protocols (eg, udp) can let things (eg, kernel support
for wireguard or vxlan or geneve) look at and possibly steal packets
before they get added to a socket buffer.

i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.


Revision tags: OPENBSD_6_7_BASE
# 1.118 13-Nov-2019 deraadt

Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasingly
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.132 30-Aug-2022 bluhm

Refactor internet PCB lookup function. Rename in_pcbhashlookup()
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@


# 1.131 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.130 21-Aug-2022 bluhm

Introduce a mutex per inpcb to serialize access to socket receive
buffer. Later it may be used to protect more of the PCB or socket.
In divert input replace the kernel lock with this mutex.
OK mvs@


# 1.129 15-May-2022 dlg

have in_pcbselsrc copy the selected address to memory provided by the caller.

having it return a pointer to something that has a lifetime managed
by a lock without accounting for it or taking a reference count or
anything like that is asking for trouble. copying the address to
caller provded memory while still inside the lock is a lot safer.

discussed with visa@
ok bluhm@ claudio@


Revision tags: OPENBSD_7_1_BASE
# 1.128 21-Mar-2022 bluhm

For multicast and broadcast packets udp_input() traverses the loop
of all UDP PCBs. From there it calls udp_sbappend() while holding
the UDP table mutex. This ends in sorwakeup() where we finally
grab the kernel lock while holding a mutex. Witness detects this
misuse.
Use the same solution as for PCB notify. Collect the affected PCBs
in a temporary list. The list is protected by exclusive net lock.
Reported-by: syzbot+7596cb96fb9f3c9d6f4f@syzkaller.appspotmail.com
OK sashan@


# 1.127 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.126 20-Mar-2022 bluhm

Include sys/mutex.h from netinet/in_pcb.h. Struct mutex will be
needed to make inpcb in kernel MP safe. To build sysctl and libkvm
based programs, we have to export it to userland.
OK claudio@


# 1.125 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.124 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.123 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.121 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_8_BASE
# 1.120 21-Jun-2020 dlg

knf: the inp_upcall line was too long.


# 1.119 21-Jun-2020 dlg

add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.

this is so protocols (eg, udp) can let things (eg, kernel support
for wireguard or vxlan or geneve) look at and possibly steal packets
before they get added to a socket buffer.

i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.


Revision tags: OPENBSD_6_7_BASE
# 1.118 13-Nov-2019 deraadt

Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasingly
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.131 22-Aug-2022 bluhm

Use rwlock per inpcb table to protect notify list. The notify
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@


# 1.130 21-Aug-2022 bluhm

Introduce a mutex per inpcb to serialize access to socket receive
buffer. Later it may be used to protect more of the PCB or socket.
In divert input replace the kernel lock with this mutex.
OK mvs@


# 1.129 15-May-2022 dlg

have in_pcbselsrc copy the selected address to memory provided by the caller.

having it return a pointer to something that has a lifetime managed
by a lock without accounting for it or taking a reference count or
anything like that is asking for trouble. copying the address to
caller provded memory while still inside the lock is a lot safer.

discussed with visa@
ok bluhm@ claudio@


Revision tags: OPENBSD_7_1_BASE
# 1.128 21-Mar-2022 bluhm

For multicast and broadcast packets udp_input() traverses the loop
of all UDP PCBs. From there it calls udp_sbappend() while holding
the UDP table mutex. This ends in sorwakeup() where we finally
grab the kernel lock while holding a mutex. Witness detects this
misuse.
Use the same solution as for PCB notify. Collect the affected PCBs
in a temporary list. The list is protected by exclusive net lock.
Reported-by: syzbot+7596cb96fb9f3c9d6f4f@syzkaller.appspotmail.com
OK sashan@


# 1.127 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.126 20-Mar-2022 bluhm

Include sys/mutex.h from netinet/in_pcb.h. Struct mutex will be
needed to make inpcb in kernel MP safe. To build sysctl and libkvm
based programs, we have to export it to userland.
OK claudio@


# 1.125 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.124 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.123 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.121 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_8_BASE
# 1.120 21-Jun-2020 dlg

knf: the inp_upcall line was too long.


# 1.119 21-Jun-2020 dlg

add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.

this is so protocols (eg, udp) can let things (eg, kernel support
for wireguard or vxlan or geneve) look at and possibly steal packets
before they get added to a socket buffer.

i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.


Revision tags: OPENBSD_6_7_BASE
# 1.118 13-Nov-2019 deraadt

Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasingly
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.129 15-May-2022 dlg

have in_pcbselsrc copy the selected address to memory provided by the caller.

having it return a pointer to something that has a lifetime managed
by a lock without accounting for it or taking a reference count or
anything like that is asking for trouble. copying the address to
caller provded memory while still inside the lock is a lot safer.

discussed with visa@
ok bluhm@ claudio@


Revision tags: OPENBSD_7_1_BASE
# 1.128 21-Mar-2022 bluhm

For multicast and broadcast packets udp_input() traverses the loop
of all UDP PCBs. From there it calls udp_sbappend() while holding
the UDP table mutex. This ends in sorwakeup() where we finally
grab the kernel lock while holding a mutex. Witness detects this
misuse.
Use the same solution as for PCB notify. Collect the affected PCBs
in a temporary list. The list is protected by exclusive net lock.
Reported-by: syzbot+7596cb96fb9f3c9d6f4f@syzkaller.appspotmail.com
OK sashan@


# 1.127 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.126 20-Mar-2022 bluhm

Include sys/mutex.h from netinet/in_pcb.h. Struct mutex will be
needed to make inpcb in kernel MP safe. To build sysctl and libkvm
based programs, we have to export it to userland.
OK claudio@


# 1.125 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.124 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.123 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.121 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_8_BASE
# 1.120 21-Jun-2020 dlg

knf: the inp_upcall line was too long.


# 1.119 21-Jun-2020 dlg

add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.

this is so protocols (eg, udp) can let things (eg, kernel support
for wireguard or vxlan or geneve) look at and possibly steal packets
before they get added to a socket buffer.

i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.


Revision tags: OPENBSD_6_7_BASE
# 1.118 13-Nov-2019 deraadt

Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasingly
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.128 21-Mar-2022 bluhm

For multicast and broadcast packets udp_input() traverses the loop
of all UDP PCBs. From there it calls udp_sbappend() while holding
the UDP table mutex. This ends in sorwakeup() where we finally
grab the kernel lock while holding a mutex. Witness detects this
misuse.
Use the same solution as for PCB notify. Collect the affected PCBs
in a temporary list. The list is protected by exclusive net lock.
Reported-by: syzbot+7596cb96fb9f3c9d6f4f@syzkaller.appspotmail.com
OK sashan@


# 1.127 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.126 20-Mar-2022 bluhm

Include sys/mutex.h from netinet/in_pcb.h. Struct mutex will be
needed to make inpcb in kernel MP safe. To build sysctl and libkvm
based programs, we have to export it to userland.
OK claudio@


# 1.125 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.124 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.123 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.121 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_8_BASE
# 1.120 21-Jun-2020 dlg

knf: the inp_upcall line was too long.


# 1.119 21-Jun-2020 dlg

add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.

this is so protocols (eg, udp) can let things (eg, kernel support
for wireguard or vxlan or geneve) look at and possibly steal packets
before they get added to a socket buffer.

i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.


Revision tags: OPENBSD_6_7_BASE
# 1.118 13-Nov-2019 deraadt

Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasingly
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.127 21-Mar-2022 bluhm

Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex
for PCB tables. It does not break userland build anymore.

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.126 20-Mar-2022 bluhm

Include sys/mutex.h from netinet/in_pcb.h. Struct mutex will be
needed to make inpcb in kernel MP safe. To build sysctl and libkvm
based programs, we have to export it to userland.
OK claudio@


# 1.125 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.124 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.123 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.121 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_8_BASE
# 1.120 21-Jun-2020 dlg

knf: the inp_upcall line was too long.


# 1.119 21-Jun-2020 dlg

add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.

this is so protocols (eg, udp) can let things (eg, kernel support
for wireguard or vxlan or geneve) look at and possibly steal packets
before they get added to a socket buffer.

i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.


Revision tags: OPENBSD_6_7_BASE
# 1.118 13-Nov-2019 deraadt

Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasingly
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.125 14-Mar-2022 tb

Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul

This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.

ok sthen


# 1.124 14-Mar-2022 bluhm

pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@


# 1.123 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.121 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_8_BASE
# 1.120 21-Jun-2020 dlg

knf: the inp_upcall line was too long.


# 1.119 21-Jun-2020 dlg

add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.

this is so protocols (eg, udp) can let things (eg, kernel support
for wireguard or vxlan or geneve) look at and possibly steal packets
before they get added to a socket buffer.

i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.


Revision tags: OPENBSD_6_7_BASE
# 1.118 13-Nov-2019 deraadt

Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasingly
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.123 02-Mar-2022 bluhm

The return value of in6_pcbnotify() is never used. Make it a void
function.
OK gnezdo@ mvs@ florian@ sashan@


# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.121 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_8_BASE
# 1.120 21-Jun-2020 dlg

knf: the inp_upcall line was too long.


# 1.119 21-Jun-2020 dlg

add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.

this is so protocols (eg, udp) can let things (eg, kernel support
for wireguard or vxlan or geneve) look at and possibly steal packets
before they get added to a socket buffer.

i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.


Revision tags: OPENBSD_6_7_BASE
# 1.118 13-Nov-2019 deraadt

Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasingly
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.122 20-Jan-2022 bluhm

Shifting signed integers left by 31 is undefined behavior in C.
found by kubsan; joint work with tobhe@; OK miod@


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.121 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_8_BASE
# 1.120 21-Jun-2020 dlg

knf: the inp_upcall line was too long.


# 1.119 21-Jun-2020 dlg

add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.

this is so protocols (eg, udp) can let things (eg, kernel support
for wireguard or vxlan or geneve) look at and possibly steal packets
before they get added to a socket buffer.

i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.


Revision tags: OPENBSD_6_7_BASE
# 1.118 13-Nov-2019 deraadt

Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasingly
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.121 25-Jan-2021 dlg

if stoeplitz is enabled, use it to provide a flowid for tcp packets.

drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.


Revision tags: OPENBSD_6_8_BASE
# 1.120 21-Jun-2020 dlg

knf: the inp_upcall line was too long.


# 1.119 21-Jun-2020 dlg

add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.

this is so protocols (eg, udp) can let things (eg, kernel support
for wireguard or vxlan or geneve) look at and possibly steal packets
before they get added to a socket buffer.

i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.


Revision tags: OPENBSD_6_7_BASE
# 1.118 13-Nov-2019 deraadt

Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasingly
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.120 21-Jun-2020 dlg

knf: the inp_upcall line was too long.


# 1.119 21-Jun-2020 dlg

add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.

this is so protocols (eg, udp) can let things (eg, kernel support
for wireguard or vxlan or geneve) look at and possibly steal packets
before they get added to a socket buffer.

i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.


Revision tags: OPENBSD_6_7_BASE
# 1.118 13-Nov-2019 deraadt

Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasingly
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.118 13-Nov-2019 deraadt

Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasingly
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.117 17-Oct-2019 dlg

in6_setsockaddr and in6_setpeeraddr can't fail, so let them return void.

this also brings them in line with the AF_INET equivalents.

ok visa@ bluhm@


Revision tags: OPENBSD_6_6_BASE
# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.116 15-Jul-2019 bluhm

Initialize struct inpcb pool not on demand, but during initialization.
Removes a global variable and avoids MP problems.
OK mpi@ visa@


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.115 04-Oct-2018 bluhm

Revert the inpcb table mutex commit. It triggers a witness panic
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.114 20-Sep-2018 bluhm

As a step towards per inpcb or socket locks, remove the net lock
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@


# 1.113 14-Sep-2018 bluhm

In general it is a bad idea to use one random secret for two things.
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@


# 1.112 14-Sep-2018 jsg

unbreak userland uses of in_pcb.h by including sys/refcnt.h
ok visa@


# 1.111 13-Sep-2018 bluhm

Add reference counting for inet pcb, this will be needed when we
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@


# 1.110 11-Sep-2018 bluhm

Make the distribution of in_ and in6_ functions in in_pcb.c and
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.109 03-Jun-2018 bluhm

Rename the incpb table field inpt_hash to inpt_mask as it contains
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@


# 1.108 02-Jun-2018 bluhm

Move the declarations of the raw ip and ip6 pcb tables into the
in_pcb.h header file.
OK mpi@ visa@


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.107 30-Mar-2018 dhill

Store the allocation size in inpcbhead for free().

OK visa@


Revision tags: OPENBSD_6_3_BASE
# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.106 01-Dec-2017 bluhm

Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOST
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@


# 1.105 06-Oct-2017 bluhm

Kill the divert-packet socket option IP_DIVERTFL to filter packets.
It used a loop over the global list divbtable that would be hard
to make MP safe. The port net/dnsfilter does not work without this,
it should be converted to divert-to. Neither other ports nor base
use this filter feature.
ports checked by sthen@; OK mpi@ benno@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.104 03-Sep-2016 phessler

Reserve the BFD destination ports in baddynamic

OK claudio@, henning@


# 1.103 04-Aug-2016 vgross

Commit in6_selectsrc() split again, with missing assignment fixed.


Revision tags: OPENBSD_6_0_BASE
# 1.102 22-Jul-2016 mpi

Revert in_selectsrc() refactoring, it breaks IPv6.

Reported by Heiko on bugs@.

ok stsp@, claudio@


# 1.101 20-Jul-2016 vgross

Split in6_selectsrc() into a low-level part and a pcb-level part, and
convert in_selectsrc() prototype to match.

Ok bluhm@ mpi@.


# 1.100 27-Jun-2016 jca

Implement IPV6_MINHOPCOUNT support.

Useful to implement GTSM support in daemons such as bgpd(8). Diff from
2013 revived by renato@. Input from bluhm@, ok bluhm@ deraadt@


# 1.99 18-Jun-2016 vgross

Add net.inet.{tcp,udp}.rootonly sysctl, to mark which ports
cannot be bound to by non-root users.

Ok millert@ bluhm@


# 1.98 11-Apr-2016 vgross

Rename in_pcblookup() to in_pcblookup_local() and change its prototype
to get rid of the now useless foreign address and ports parameters.

ok mpi@


# 1.97 05-Apr-2016 vgross

Move inp_laddr assignment after in_pcbpickport(), extend in_pcbpickport()
as needed.

Ok bluhm@


# 1.96 23-Mar-2016 vgross

Merge in_pcbbind() and in6_pcbbind(), and change every call to
in6_pcbbind() into in_pcbbind().

Ok jca@ mpi@


# 1.95 23-Mar-2016 vgross

Extract in_pcbaddrisavail() from in_pcbbind().

ok jca@


# 1.94 21-Mar-2016 vgross

Extract in6_pcbaddrisavail() from in6_pcbbind(), and use it when
checking for source availability in udp6_output(); This time with
all the files.

Ok jca@ bluhm@


Revision tags: OPENBSD_5_9_BASE
# 1.93 03-Dec-2015 tedu

rm unused kernel only IPV6_RECVRTHDRDSTOPTS sockopt. ok deraadt sthen


# 1.92 02-Dec-2015 vgross

Move port picking away from in_pcbbind()

ok sthen@


# 1.91 24-Oct-2015 mpi

Ignore Router Advertisment's current hop limit.

Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.

Imputs from bluhm@, ok phessler@, florian@, bluhm@


# 1.90 22-Sep-2015 vgross

Remove inpt_lastport from struct inpcbtable, use local variables
in in_pcbbind() and in6_pcbsetport()

ok claudio@, with input from David Hill


Revision tags: OPENBSD_5_8_BASE
# 1.89 16-Apr-2015 markus

remove unfinished/unused support for socket-attached ipsec-policies
ok mikeb


# 1.88 14-Apr-2015 mikeb

Remove support for storing credentials and auth information in the kernel.

This code is largely unfinished and is not used for anything. The change
leaves identities as only objects referenced by ipsec_ref structure and
their handling requires some changes to support more advanced matching of
IPsec connections.

No objections from reyk and hshoexer, with and OK markus.


Revision tags: OPENBSD_5_7_BASE
# 1.87 15-Nov-2014 dlg

use siphash in the in_pcb hashing. this mitigates it against flooding
attacks.

this is a textbook use of siphash.

the idea of using siphash for this came from yasuoka-san, but i had
the time to do it. he also tested and tweaked this diff.

ok yasuoka@ mikeb@


Revision tags: OPENBSD_5_6_BASE
# 1.86 12-Jul-2014 yasuoka

Resize the pcb hashtable automatically. The table size will be doubled
when the number of the hash entries reaches 75% of the table size.

ok dlg henning, 'commit in' claudio


# 1.85 18-Apr-2014 jca

Invert the signature logic of in{,6}_selectsrc, make them return the
error code and pass the resulting source address back to the caller
through a pointer, as suggested by chrisz. This gives us more readable
code, and eases the deletion of useless checks in the callers' error path.
Add a bunch of "0 -> NULL" conversions, while here.
ok chrisz@ mpi@


# 1.84 16-Apr-2014 mpi

Merge in_fixaddr() into in_selectsrc() in order to prepare for
IP_SENDSRCADDR support. This reduces the differences with the
IPv6 version and kill some comments that are no longer true.

ok jca@, chrisz@, mikeb@


# 1.83 06-Apr-2014 chrisz

factor out source and destination address mangling from in_pcbconnect()
for later reuse in udp_output().

"Apart from that OK" claudio@


Revision tags: OPENBSD_5_5_BASE
# 1.82 20-Dec-2013 krw

Switch inpt_queue from CIRCLEQ to TAILQ. Thus ending use of CIRCLEQ
in the base. Ports fixes to follow shortly for the two ports (gkrellm
and net-snmp) affected.

ok zhuk@ millert@


# 1.81 23-Oct-2013 deraadt

Back when some NRL code was merged into KAME to create the *BSD IPV6
stack (factoid: by a bunch of people in my living room), some compatibility
#define's were created to shim incompatible inpcb access methods. There
was an understanding they would eventually be removed. Since they are
error prone, and 1999 is a long time ago, now they die.
ok mikeb claudio mpi


# 1.80 20-Oct-2013 phessler

Put a large chunk of the IPv6 rdomain support in-tree.

Still some important missing pieces, and this is not yet enabled.

OK bluhm@


Revision tags: OPENBSD_5_4_BASE
# 1.79 31-May-2013 bluhm

The function rip6_ctlinput() claims that sa6_src is constant to
allow the assingment of &sa6_any. But rip6_ctlinput() could not
guarantee that as it casted away the const attribute when it passes
the pointer to in6_pcbnotify(). Replace sockaddr with const
sockaddr_in6 in the in6_pcbnotify() parameters. This reduces the
number of casts. Also adjust in6_pcbhashlookup() to handle the
const attribute correctly.
Input and OK claudio@


# 1.78 17-May-2013 mpi

Move an extern declaration into its corresponding header file.


# 1.77 29-Mar-2013 bluhm

Declare struct pf_state_key in the mbuf and in_pcb header files to
avoid ugly casts.
OK krw@ tedu@


# 1.76 14-Mar-2013 mpi

tedu faith(4), suggested by todd@ some weeks ago after a submission by
dhill.

ok krw@, mikeb@, tedu@ (implicit)


Revision tags: OPENBSD_5_3_BASE
# 1.75 16-Jan-2013 bluhm

Pass struct inpcb pointer to in_pcb...() functions instead of void
pointer. Allows stricter type checking. No functional change.
OK claudio@


# 1.74 21-Oct-2012 benno

Add the IP_DIVERTFL socket option on divert(4) sockets to control
which packets (as in direction) of the traffic will be diverted
through the divert socket.
ok claudio@, henning@


# 1.73 17-Sep-2012 yasuoka

add IPV6_RECVDSTPORT socket option, which enables us to get original
(= before divert) destination port of a UDP packet. The way to use
this option is same as IP_RECVDSTPORT.

from UMEZAWA Takeshi
tweaks from jmc; ok henning bluhm


Revision tags: OPENBSD_5_2_BASE
# 1.72 16-Jul-2012 markus

add IP_IPSECFLOWINFO option to sendmsg() and recvmsg(), so npppd(4)
can use this to select the IPsec tunnel for sending L2TP packets.
this fixes Windows (always binding to 1701) and Android clients
(negotiating wildcard flows); feedback mpf@ and yasuoka@;
ok henning@ and yasuoka@; ok jmc@ for the manpage


Revision tags: OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.71 15-Jun-2011 mikeb

Add IP_RECVRTABLE socket option to be used with a IPPROTO_IP
level that allows one to retrieve the original routing domain
of UDP datagrams diverted by the pf via "divert-to" with a
recvmsg(2).

ok claudio


Revision tags: OPENBSD_4_9_BASE
# 1.70 23-Sep-2010 yasuoka

add a new IP level socket option IP_PIPEX. This option is used for L2TP
support by pipex.
OK henning@, "Carry on" blambert@


Revision tags: OPENBSD_4_8_BASE
# 1.69 03-Jul-2010 guenther

Fix the naming of interfaces and variables for rdomains and rtables
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.

Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.

Written by claudio@, criticized^Wcritiqued by me


Revision tags: OPENBSD_4_7_BASE
# 1.68 13-Nov-2009 claudio

Extend the protosw pr_ctlinput function to include the rdomain. This is
needed so that the route and inp lookups done in TCP and UDP know where
to look. Additionally in_pcbnotifyall() and tcp_respond() got a rdomain
argument as well for similar reasons. With this tcp seems to be now
fully rdomain save and no longer leaks single packets into the main domain.
Looks good markus@, henning@


Revision tags: OPENBSD_4_6_BASE
# 1.67 05-Jun-2009 claudio

Initial support for routing domains. This allows to bind interfaces to
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@


Revision tags: OPENBSD_4_4_BASE OPENBSD_4_5_BASE
# 1.66 10-Jul-2008 djm

add X11 ports to default TCP baddynamic list


# 1.65 09-Jul-2008 djm

expand the net.inet.(tcp|udp).baddynamic dynamic source port
skipping bitmasks to cover the entire 65536 port space - previously
they covered 512-1024 only.

sysctl needs to be updated to cope with this change; please
"make includes" before rebuilding it.

feedback millert@ ok millert@ deraadt@ markus@


# 1.64 03-Jul-2008 henning

link pf state keys to tcp pcbs and vice versa.
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan


# 1.63 23-May-2008 thib

Deal with the situation when TCP nfs mounts timeout and processes
get hung in nfs_reconnect() because they do not have the proper
privilages to bind to a socket, by adding a struct proc * argument
to sobind() (and the *_usrreq() routines, and finally in{6}_pcbbind)
and do the sobind() with proc0 in nfs_connect.

OK markus@, blambert@.
"go ahead" deraadt@.

Fixes an issue reported by bernd@ (Tested by bernd@).
Fixes PR5135 too.


# 1.62 15-May-2008 markus

divert for ipv6; ok henning, pyr


# 1.61 09-May-2008 markus

IP_RECVDSTPORT, allows you to get the destination port of UDP datagrams
for pf(4) diverted packets; based on patch by Scot Loach; ok beck@


# 1.60 09-May-2008 markus

divert packets to local socket without modifying the ip header;
makes transparent proxies much easier; ok beck@, feedback claudio@


Revision tags: OPENBSD_4_1_BASE OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.59 22-Feb-2007 millert

Remove TCP ports 760 and 761 from DEFBADDYNAMICPORTS_TCP; they are
not used with Heimdal. Add UDP port 749 to DEFBADDYNAMICPORTS_UDP
for consistency with DEFBADDYNAMICPORTS_TCP. We retain some Kerberos
4 ports for people running Heimdal in Kerberos 4 compat mode.
OK deraadt@ beck@


# 1.58 09-Dec-2006 itojun

switch IPv6 advanced API from RFC2292 to RFC3542 (2292 is superseded by 3542).
the kernel still handles RFC2292 set/getsockopts, so that compiled binary
has no trouble running. userland sees RFC3542 symbols only on header file
so new code has to use RFC3542 API.

bump libc shlib minor for function additions.

tested on i386/amd64 by jmc, i386 by brad. checked by deraadt.


# 1.57 11-Oct-2006 henning

implement IP_MINTTL socket option fo tcp sockets
This is for RFC3682 aka the TTL security hack - sender sets TTL to 255,
receiver checks no router on the way (or, no more than expected) reduced
the TTL. carp uses that technique already.
modeled after FreeBSD implementation.
ok claudio djm deraadt


# 1.56 11-Oct-2006 henning

implement IP_RECVTTL socket option.
when set on raw or udp sockets, userland receives the incoming packet's TTL
as ancillary data (cmsg shitz). modeled after the FreeBSD implementation.
ok claudio djm deraadt


# 1.55 26-Sep-2006 deraadt

udp port 664 is sometimes also stolen on the wire by ipmi/asf balony.
Did these vendors really really really not think? Absolute morons.


Revision tags: OPENBSD_4_0_BASE
# 1.54 30-May-2006 deraadt

Put ASF/IPMI port 623 into the bad dynamic udp table, because otherwise
we will randomly choose that stupid port, which NIC's are sometimes
programmed to eat invisibly; sthen@bootes.spacehopper.org, pr5139


# 1.53 29-May-2006 claudio

Make savecontrol functions more generic and use them now for raw IP too.
Additionally add the IP_RECVIF option which returns the interface a packet
was received on. OK markus@ norby@


Revision tags: OPENBSD_3_9_BASE
# 1.52 10-Dec-2005 deraadt

in ansi c, bitfields must be done against int, unsigned int, or _Bool.
so we must start to use u_int; ok cloder


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.51 10-Aug-2004 markus

remove in_pcbnotify, it is no longer used.


Revision tags: SMP_SYNC_A
# 1.50 12-Jun-2004 itojun

support IPV6_USE_MIN_MTU (forgot to commit the file, sorry). noted by Anil


Revision tags: OPENBSD_3_5_BASE SMP_SYNC_B
# 1.49 21-Dec-2003 markus

change in*_pcbnotify to return numbers of matches; ok itojun, mcbride, henning


# 1.48 08-Dec-2003 mcbride

Mbuf tag tcp and udp packets which are translated to localhost, and
use the the presence of this tag to reverse the match order in
in{6}_pcblookup_listen(). Some daemons (such as portmap) do a double
bind, binding to both * and localhost in order to differentiate local
from non-local connections, and potentially granting more privilege to
local ones. This change ensures that redirected connections to localhost
do not appear local to such a daemon.

Bulk of changes from dhartmei@, some changes markus@

ok dhartmei@ deraadt@


# 1.47 04-Nov-2003 markus

add in(6)_pcblookup_listen() and replace all calls to in_pcblookup()
with either in(6)_pcbhashlookup() or in(6)_pcblookup_listen();
in_pcblookup is now only used by bind(2); speeds up pcb lookup for
listening sockets; from Claudio Jeker


# 1.46 25-Oct-2003 markus

additional hash for local port; improves speed of implicit bind
from >1000K cpu cycles to 20-30K for 18000 sockets on i386;
test+feedback by Claudio Jeker; ok itojun@;
[make sure you rebuild netstat/systat, too]


Revision tags: OPENBSD_3_4_BASE
# 1.45 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_A UBC_SYNC_B
# 1.44 04-Sep-2002 itojun

pass struct proc * down to in6_pcbsetport


# 1.43 09-Jun-2002 itojun

whitespace


# 1.42 08-Jun-2002 itojun

sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.


# 1.41 31-May-2002 angelos

Keep an policy attached to each socket (that needs it), and cleanup as
needed on socket tear-down.


Revision tags: OPENBSD_3_1_BASE
# 1.40 14-Mar-2002 millert

Final __P removal plus some cosmetic fixups


# 1.39 14-Mar-2002 millert

First round of __P removal in sys


Revision tags: OPENBSD_3_0_BASE UBC_BASE
# 1.38 05-Jul-2001 jjbg

branches: 1.38.4;
IPComp itself (include files). angelos@ ok.


# 1.37 12-Jun-2001 angelos

IPsec-related socket options; these can be set/removed/retrieved, but
are not taken into consideration in anything just yet.


# 1.36 09-Jun-2001 angelos

Inclusion protection.


# 1.35 27-May-2001 angelos

Keep local authentication material on the PCB.


# 1.34 21-May-2001 angelos

Use a reference-counted structure for IPsec IDs and credentials, so we
can cheaply keep copies of them at the PCB. ok deraadt@


Revision tags: OPENBSD_2_9_BASE
# 1.33 28-Mar-2001 angelos

Allow tdbi's to appear in mbufs throughout the stack; this allows
security properties of the packets to be pushed up to the application
(not done yet). Eventually, this will be turned into a packet
attributes framework.

Make sure tdbi's are free'd/cleared properly whenever drivers (or NFS)
does weird things with mbufs.


# 1.32 16-Feb-2001 itojun

pull in new pcb notification code from kame. better handling of scope address.


# 1.31 16-Feb-2001 itojun

amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync better with kame


# 1.30 08-Feb-2001 itojun

witch raw ip6 socket code from NRL to kame.
makes upgrades/code sharing much easier.


Revision tags: OPENBSD_2_8_BASE
# 1.29 11-Oct-2000 itojun

nuke inp_flags bits for controlling IPv4 mapped address.
we don't support IPv4 mapped address,
and there are inconsistent bit manipulation code so it's safer to nuke them.


# 1.28 10-Oct-2000 provos

verify payload of the icmp need fragment message at the tcp layer. okay itojun@


# 1.27 09-Oct-2000 provos

check if we have a tcb connected to the destination quoted in the icmp need
fragment message when doing path mtu discovery. okay angelos@


# 1.26 18-Sep-2000 provos

Path MTU discovery based on NetBSD but with the decision to use the DF
flag delayed to ip_output(). That halves the code and reduces most of
the route lookups. okay deraadt@


# 1.25 18-Jun-2000 itojun

sync with KAME udp6_output(). udp output logic is very different between
IPv4/v6 so the separation should make more sense.

TODO: remove IPv6 case from udp_output()
TODO: remove/comment out/#if 0 IPv4 mapped address cases


# 1.24 13-Jun-2000 itojun

allow link-local IPv6 addres in in6_pcbbind.


Revision tags: OPENBSD_2_7_BASE
# 1.23 27-Apr-2000 millert

add TCP port 587 to default list of reserved ports not to allocate dynamically in order to reserve it for sendmail.


Revision tags: SMP_BASE
# 1.22 07-Feb-2000 itojun

branches: 1.22.2;
fix include file path related to ip6.


# 1.21 11-Jan-2000 angelos

Remove ifdef'ed out definitions.


# 1.20 27-Dec-1999 itojun

synchronize inp_flags definition across kame/*bsd.
this would ease us implement future COMPAT_*BSD.

(sync with kame tree)


# 1.19 12-Dec-1999 itojun

make it easier to synchronize INP_xx flags and IN6P_xx flags.


Revision tags: kame_19991208
# 1.18 08-Dec-1999 itojun

bring in KAME IPv6 code, dated 19991208.
replaces NRL IPv6 layer. reuses NRL pcb layer. no IPsec-on-v6 support.
see sys/netinet6/{TODO,IMPLEMENTATION} for more details.

GENERIC configuration should work fine as before. GENERIC.v6 works fine
as well, but you'll need KAME userland tools to play with IPv6 (will be
bringed into soon).


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE
# 1.17 27-Mar-1999 provos

add SADB_X_BINDSA to pfkey allowing incoming SAs to refer to an outgoing
SA to be used, use this SA in ip_output if available. allow mobile road
warriors for bind SAs with wildcard dst and src addresses. check IPSEC
AUTH and ESP level when receiving packets, drop them if protection is
insufficient. add stats to show dropped packets because of insufficient
IPSEC protection. -- phew. this was all done in canada. dugsong and linh
provided the ride and company.


# 1.16 24-Mar-1999 cmetz

Removed inclusion of netinet6/in6.h. This was an artifact of when the core
IPv6 symbols were there rather than in netinet/in.h, and now not only is
unnecessary but also could create problems (see PR library/781).


# 1.15 11-Jan-1999 deraadt

netinet merge of NRL stuff. some indent and shrinkage needed; NRL/cmetz


# 1.14 08-Jan-1999 deraadt

more IPV6 merge; cmetz


# 1.13 07-Jan-1999 deraadt

INET6 support


# 1.12 07-Jan-1999 deraadt

in_pcblookup() now takes ptr to both ip address arguments


# 1.11 07-Jan-1999 deraadt

rename baddynamic() to in_baddynamic(), and export it


Revision tags: OPENBSD_2_4_BASE
# 1.10 18-May-1998 provos

first step to the setsockopt/getsockopt interface as described in
draft-mcdonald-simple-ipsec-api, kernel notifies (EMT_REQUESTSA) signal
userland key management applications when security services are requested.
this is only for outgoing connections at the moment, incoming packets
are not yet checked against the selected socket policy.


Revision tags: OPENBSD_2_2_BASE OPENBSD_2_3_BASE
# 1.9 26-Aug-1997 deraadt

indent


# 1.8 19-Aug-1997 millert

Add DP_CLR() macro


# 1.7 19-Aug-1997 millert

Theo doesn't like extra kernel options, so don't allow
DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden from the kernel. It's not really too useful
since there is a nice sysctl interface for this stuff.


# 1.6 16-Aug-1997 millert

Allow DEFBADDYNAMICPORTS_TCP and DEFBADDYNAMICPORTS_UDP to be
overridden via kernel config file.


# 1.5 09-Aug-1997 millert

The list of tcp/udp ports not to allocate dynamically is now
a bitmask configurable via sysctl([38]). The default values
have not changed. If one wants to change the list it should
be done early on in /etc/rc.


Revision tags: OPENBSD_2_1_BASE
# 1.4 28-Feb-1997 angelos

Moved IPsec socket state to the PCB.


Revision tags: OPENBSD_2_0_BASE
# 1.3 29-Jul-1996 downsj

From FreeBSD (with slightly different sysctl names):

"... Allow the user to nominate one of three ranges of port numbers as
candidates for selecting a local address to replace a zero port number.
The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg)
call. The three ranges are: default, high (to bypass firewalls) and
low (to get a port below 1024).

The default and high port ranges are sysctl settable under sysctl
net.inet.ip.portrange.* [net.inet.ip.portfirst, net.inet.ip.portlast,
net.inet.ip.porthifirst, and net.inet.ip.porthilast currently in OpenBSD.]

This code also fixes a potential deadlock if the system accidently ran out
of local port addresses. It'd drop into an infinite while loop.

The secure port selection (for root) should reduce overheads and increase
reliability of rlogin/rlogind/rsh/rshd if they are modified to take
advantage of it."


# 1.2 03-Mar-1996 niklas

From NetBSD: 960217 merge


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision