History log of /netbsd-current/sys/dist/pf/net/pf.c
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1.87 04-Nov-2022 ozaki-r

inpcb: rename functions to in6pcb_*


# 1.86 04-Nov-2022 ozaki-r

inpcb: rename functions to inpcb_*

Inspired by rmind-smpnet patches.


# 1.85 28-Oct-2022 ozaki-r

Adjust pf, wg, dccp and sctp for struct inpcb integration


Revision tags: bouyer-sunxi-drm-base thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.84 10-Aug-2020 rin

Clean up _LKM --> _MODULE leftovers.

Note that _KERNEL is always defined for modules.


Revision tags: netbsd-9-3-RELEASE netbsd-9-2-RELEASE netbsd-9-1-RELEASE bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 netbsd-9-0-RELEASE netbsd-9-0-RC2 ad-namecache-base2 ad-namecache-base1 ad-namecache-base netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.83 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.82 11-Jul-2018 maxv

Rename

ip_undefer_csum -> in_undefer_cksum
in_delayed_cksum -> in_undefer_cksum_tcpudp

The two previous names were inconsistent and misleading.

Put the two functions into in_offload.c. Add comments to explain what
we're doing.

The same could be done for IPv6.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.81 03-May-2018 maxv

branches: 1.81.2;
Remove m_copy completely.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.80 19-Feb-2018 christos

branches: 1.80.2;
It is normal for socket credentials to be missing for incoming sockets,
so don't warn.


# 1.79 18-Feb-2018 christos

PR/53036: Alexander Nasonov: 'block user' in pf's ruleset panics 8.0_BETA
Check for NULL.


# 1.78 09-Feb-2018 maxv

Oh, what is this. Fix a remotely-triggerable integer overflow: the way we
define TCPOLEN_SACK makes it unsigned, and the comparison in the while()
is unsigned too. That's not the expected behavior, the original code
wanted a signed comparison.

It's pretty easy to make 'hlen' go negative and trigger a buffer overflow.

This bug was reported 8 years ago by Lucio Albornoz in PR/44059.


Revision tags: tls-maxphys-base-20171202
# 1.77 31-Oct-2017 christos

PR/52682: David Binderman: Fix wrong assignment (in the !__NetBSD__ code)


Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.76 14-Feb-2017 ozaki-r

branches: 1.76.6;
Do ND in L2_output in the same manner as arpresolve

The benefits of this change are:
- The flow is consistent with IPv4 (and FreeBSD and OpenBSD)
- old: ip6_output => nd6_output (do ND if needed) => L2_output (lookup a stored cache)
- new: ip6_output => L2_output (lookup a cache. Do ND if cache not found)
- We can remove some workarounds in nd6_output
- We can move L2 specific operations to their own place
- The performance slightly improves because one cache lookup is reduced


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.75 08-Dec-2016 ozaki-r

branches: 1.75.2;
Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.74 20-Jun-2016 knakahara

branches: 1.74.2;
apply if_output_lock() to L3 callers which call ifp->if_output() of L2(or L3 tunneling).


# 1.73 10-Jun-2016 ozaki-r

Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.


Revision tags: netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base nick-nhusb-base-20160529 netbsd-7-0-1-RELEASE nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.72 25-Jul-2014 ozaki-r

branches: 1.72.2; 1.72.4; 1.72.6; 1.72.10;
Unbreak the build of pf


# 1.71 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base
# 1.70 20-Oct-2013 christos

branches: 1.70.2;
fix compiler warnings


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6 jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.69 22-Mar-2012 drochner

branches: 1.69.2; 1.69.4;
remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.68 19-Dec-2011 drochner

branches: 1.68.2; 1.68.6; 1.68.8;
do missing ipsec->kame_ipsec renames


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base
# 1.67 19-Nov-2011 tls

branches: 1.67.2;
First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.


Revision tags: jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.66 29-Aug-2011 jmcneill

branches: 1.66.2;
build pf module with WARNS=3, and remove the need for -Wno-shadow


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.65 03-May-2011 dyoung

Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.64 07-May-2010 degroote

branches: 1.64.2;
Add support for pfs(8)

pfs(8) is a tool similar to ipfs(8) but for pf(4). It allows the admin to
dump internal configuration of pf, and restore at a latter point, after a
maintenance reboot for example, in a transparent way for user.

This work has been done mostly during my GSoC 2009

No objections on tech-net@


Revision tags: uebayasi-xip-base1
# 1.63 12-Apr-2010 ahoka

- Make the pf and pflog driver able to detach.
- Add code for module support.

Original patch from Jared McNeill


# 1.62 12-Apr-2010 skrll

Spello in comment.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base
# 1.61 19-Jan-2010 pooka

branches: 1.61.2; 1.61.4;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.


# 1.60 30-Dec-2009 elad

Replace uidinfo.h with kauth.h, should fix problems observed by tron@.


# 1.59 30-Dec-2009 elad

Use the right member to store gid in the non-NetBSD case.

Pointed out by uebayasi@ and cegger@, thanks!


# 1.58 30-Dec-2009 elad

Get uid/gid from the socket's credentials.


Revision tags: matt-premerge-20091211 yamt-nfs-mp-base8 jym-xensuspend-nbase
# 1.57 14-Sep-2009 degroote

Import pfsync support from OpenBSD 4.2

Pfsync interface exposes change in the pf(4) over a pseudo-interface, and can
be used to synchronise different pf.

This work was part of my 2009 GSoC

No objection on tech-net@


Revision tags: yamt-nfs-mp-base7
# 1.56 28-Jul-2009 minskim

Remove LKM code from pf.


Revision tags: jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.55 16-Jun-2009 minskim

Reduce diff with OpenBSD. No functional change.


Revision tags: yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.54 13-Apr-2009 christos

Fix http://www.securityfocus.com/archive/1/502634, from OpenBSD.
XXX: should be pulled up to 5.x


Revision tags: netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.53 11-Oct-2008 pooka

branches: 1.53.2; 1.53.4; 1.53.8;
Move uidinfo to its own module in kern_uidinfo.c and include in rump.
No functional change to uidinfo.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.52 18-Jun-2008 yamt

branches: 1.52.2;
merge yamt-pf42 branch.
(import newer pf from OpenBSD 4.2)

ok'ed by peter@. requested by core@


Revision tags: yamt-pf42-base4 yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 yamt-nfs-mp-base yamt-pf42-base
# 1.51 15-Apr-2008 thorpej

branches: 1.51.2; 1.51.4; 1.51.6; 1.51.8;
Make ip6 and icmp6 stats per-cpu.


# 1.50 12-Apr-2008 thorpej

Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.49 08-Apr-2008 thorpej

Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.


# 1.48 08-Apr-2008 thorpej

Change TCP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old tcpstat structure; old netstat
binaries will continue to work properly.


# 1.47 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.46 06-Apr-2008 thorpej

Change UDP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmpstat structure; old netstat
binaries will continue to work properly.


# 1.45 06-Apr-2008 thorpej

Change ICMP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmpstat structure; old netstat
binaries will continue to work properly.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.44 14-Jan-2008 dyoung

branches: 1.44.6;
Change rtcache_init()+rtcache_getrt() and
rtcache_init_noclone()+rtcache_getrt() to single rtcache_init()
and rtcache_init_clone() calls.


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.43 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.42 11-Dec-2007 lukem

use __KERNEL_RCSID()


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.41 28-Nov-2007 dyoung

branches: 1.41.2; 1.41.4; 1.41.6;
Bug fix: make pf_route() set M_CSUM_IPV4 before calling ip_fragment().

If you use a route-to rule such as 'pass out quick on ath0 route-to
gre2 all', and the MTU on gre2 is smaller than the MTU on ath0,
then pf_route() will fragment your packet by calling ip_fragment().
Because pf_route() did not set M_CSUM_IPv4, ip_fragment() would
not compute the checksum on the fragments, and PF would send IP
fragments with bad checksums out of gre2.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base vmlocking-base
# 1.40 07-Aug-2007 yamt

branches: 1.40.2; 1.40.8;
reduce diff.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.39 17-May-2007 christos

branches: 1.39.2; 1.39.6;
Coverity CID 3157: remove bogus break.


Revision tags: yamt-idlelwp-base8
# 1.38 10-May-2007 dyoung

pfctl: extend pf.conf(5) syntax. Let the operator supply an optional
"state lock" flag (if-bound, gr-bound, floating) at the end of a
NAT rule. The new syntax is backwards-compatbile with the old
syntax.

PF (kernel): change the macro BOUND_IFACE() to the inline function
bound_iface(), and add a new argument, the applicable NAT rule.
Use both the flags on the applicable filter rule and on the applicable
NAT rule to decide whether or not to bind a state to the interface
or the group where it is created.


# 1.37 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.36 04-Mar-2007 christos

branches: 1.36.2; 1.36.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.35 17-Feb-2007 dyoung

In pf_rtlabel_match, use rtcache_free()/rtcache_init(). This is
just cosmetic, since the whole routine is presently #if 0'd.


Revision tags: post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 newlock2-base
# 1.34 15-Dec-2006 joerg

branches: 1.34.2;
Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


# 1.33 13-Dec-2006 matt

Don't apply a window scale to the window size in a SYN packet.


Revision tags: yamt-splraiseipl-base3
# 1.32 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.31 04-Dec-2006 dyoung

Indent these macros for readability. People have to read this
code, too.


# 1.30 04-Dec-2006 dyoung

Lightly constify. Helps compile-time checking that we are not
scribbling over shared or read-only memory---e.g., in mbufs.


# 1.29 04-Dec-2006 dyoung

No need for a struct route_in6 in pf_route6(). Replace it with a
sockaddr_in6.

In pf_calc_mss(), factor common code out of PF_INET and PF_INET6
switch cases.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base netbsd-4-base
# 1.28 16-Nov-2006 christos

branches: 1.28.2; 1.28.8;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.27 12-Oct-2006 peter

Merge the peter-altq branch.

(sync with KAME & add support for using ALTQ with pf(4)).


# 1.26 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.25 07-Oct-2006 peter

PR/34746: Nino Dehne: pf(4)'s synproxy state breaks when used with tags

Apply OpenBSD src/sys/net/pf.c rev 1.486 and 1.487:

1.486:
When synproxy sends packets to the destination host, make sure to copy
the 'tag' from the original state entry into the outgoing mbuf.

1.487:
When synproxy completes the replayed handshake and modifies the state
into a normal one, it sets both peers' sequence windows. Fix a bug where
the previously advertised windows are applied to the wrong side (i.e.
peer A's seqhi is peer A's seqlo plus peer B's, not A's, window). This
went undetected because mostly the windows are similar and/or re-
advertised soon. But there are (rare) cases where a synproxy'd connection
would stall right after handshake. Found by Gleb Smirnoff.


# 1.24 01-Oct-2006 pavel

In pf, there are lots of #ifdef ALTQ, but our ALTQ is not what pf expects,
and if ALTQ and pf are both enabled, it leads to compile errors. So,
change all tests for ALTQ to ALTQ_NEW, which won't be defined.

This allows simultaneous compilation of pf and ALTQ and is a temporary
measure before the peter-altq brach is merged.

Tested and approved by Peter Postma.


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base rpaulo-netinet-merge-pcb-base
# 1.23 14-May-2006 christos

branches: 1.23.8; 1.23.10;
XXX: GCC uninitialized


Revision tags: elad-kernelauth-base
# 1.22 11-May-2006 mrg

quell GCC 4.1 uninitialised variable warnings.

XXX: we should audit the tree for which old ones are no longer needed
after getting the older compilers out of the tree..


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.21 19-Feb-2006 peter

branches: 1.21.2; 1.21.4; 1.21.6;
Fix TCP/UDP checksum handling as pointed out by Daniel Hartmeier in:
http://mail-index.netbsd.org/tech-net/2006/01/21/0000.html.

Problem reported and patch tested by der Mouse & Nino Dehne (PR/32874).


# 1.20 07-Feb-2006 rpaulo

In pf_socket_lookup() fix copy & paste problem when in6_pcblookup_bind()
returns NULL.


# 1.19 11-Dec-2005 christos

branches: 1.19.2; 1.19.4; 1.19.6;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base ktrace-lwp-base
# 1.18 23-Oct-2005 christos

Adjust for icmp_error signature.


Revision tags: yamt-vop-base
# 1.17 01-Jul-2005 peter

branches: 1.17.2; 1.17.4;
Resolve conflicts (pf from OpenBSD 3.7, kernel part).


# 1.16 15-Jun-2005 lukem

Use an "XXXGCC -Wuninitalized" style that is consistent with that used
elsewhere in the tree.


# 1.15 14-Jun-2005 jmc

Cleanup XXGCC in a few places to make it easier to see.


# 1.14 13-Jun-2005 jmc

Fix unitialized warnings that only crop up on m68k. XXGCC taggedd


# 1.13 07-May-2005 christos

more fallout from so_uid -> so_uidinfo.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.12 14-Feb-2005 peter

branches: 1.12.4;
Merge in a fix from OPENBSD_3_6.
ok yamt@

> MFC:
> Fix by dhartmei@
>
> ICMP state entries use the ICMP ID as port for the unique state key. When
> checking for a usable key, construct the key in the same way. Otherwise,
> a colliding key might be missed or a state insertion might be refused even
> though it could be inserted. The second case triggers the endless loop
> fixed by 1.474, possibly allowing a NATed LAN client to lock up the kernel.
> Report and test data by Srebrenko Sehic.


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.11 21-Dec-2004 peter

branches: 1.11.2; 1.11.4;
Apply a patch from OPENBSD_3_6 branch (ok yamt).

MFC:
Fix by dhartmei@

IPv6 packets can contain headers (like options) before the TCP/UDP/ICMP6
header. pf finds the first TCP/UDP/ICMP6 header to filter by traversing
the header chain. In the case where headers are skipped, the protocol
checksum verification used the wrong length (included the skipped headers),
leading to incorrectly mismatching checksums. Such IPv6 packets with
headers were silently dropped. Reported by Bernhard Schmidt.

ok deraadt@ dhartmei@ mcbride@


# 1.10 21-Dec-2004 peter

Apply a patch from OPENBSD_3_6 branch (ok yamt).

MFC:
Fix by mcbride@

Initialise init_addr in pf_map_addr() in the PF_POOL_ROUNDROBIN,
prevents a possible endless loop in pf_get_sport() with 'static-port'

Reported by adm at celeritystorm dot com in FreeBSD PR74930, debugging
by dhartmei@

ok mcbride@ dhartmei@ deraadt@ henning@


# 1.9 21-Dec-2004 yamt

pf_check_proto_cksum: use {tcp,udp}_input_checksum so that we can:
- handle loopback checksum omission properly.
- profit from h/w checksum offloading.


Revision tags: kent-audio1-base
# 1.8 05-Dec-2004 peter

Apply a patch from OpenBSD 3.6 branch (ok yamt@).

MFC:
Fix by dhartmei@

fix a bug that leads to a crash when binat rules of the form
'binat from ... to ... -> (if)' are used, where the interface
is dynamic. reported by kos(at)bastard(dot)net, analyzed by
Pyun YongHyeon.


# 1.7 21-Nov-2004 peter

Apply a patch from the OPENBSD_3_6 branch, ok itojun.

MFC:
Fix by dhartmei@

The flag to re-filter pf-generated packets was set wrong by synproxy
for ACKs. It should filter the ACK replayed to the server, instead of
of the one to the client.


# 1.6 21-Nov-2004 peter

Apply a patch from the OPENBSD_3_6 branch, ok itojun.

MFC:
Fix by dhartmei@

For RST generated due to state mismatch during handshake, don't set
th_flags TH_ACK and leave th_ack 0, just like the RST generated by
the stack in this case. Fixes the Raptor workaround.


# 1.5 14-Nov-2004 yamt

resolve conflicts. (pf from OpenBSD 3.6, kernel part)


# 1.4 08-Sep-2004 yamt

remove no longer needed caddr_t casts to reduce diffs from openbsd.


# 1.3 22-Jun-2004 martin

branches: 1.3.2;
Fix formatting for 64 bit archs. This fixes PR port-sparc64/26010.
While there, make it compile for non-INET6 aware kernels.


# 1.2 22-Jun-2004 itojun

PF from openbsd 3.5. missing features:
- pfsync (due to protocol # assignment issues)
- carp (not really a PF portion, but thought important to mention)
- PF and ALTQ are mutually-exclusive. this will be sorted out when
kjc@csl.sony.co.jp updates ALTQ and PF (and API inbetween)

reviewed by matt, christos, perry

torture-test is very welcomed.


# 1.1 22-Jun-2004 itojun

branches: 1.1.1;
Initial revision


# 1.85 28-Oct-2022 ozaki-r

Adjust pf, wg, dccp and sctp for struct inpcb integration


Revision tags: bouyer-sunxi-drm-base thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.84 10-Aug-2020 rin

Clean up _LKM --> _MODULE leftovers.

Note that _KERNEL is always defined for modules.


Revision tags: netbsd-9-3-RELEASE netbsd-9-2-RELEASE netbsd-9-1-RELEASE bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 netbsd-9-0-RELEASE netbsd-9-0-RC2 ad-namecache-base2 ad-namecache-base1 ad-namecache-base netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.83 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.82 11-Jul-2018 maxv

Rename

ip_undefer_csum -> in_undefer_cksum
in_delayed_cksum -> in_undefer_cksum_tcpudp

The two previous names were inconsistent and misleading.

Put the two functions into in_offload.c. Add comments to explain what
we're doing.

The same could be done for IPv6.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.81 03-May-2018 maxv

branches: 1.81.2;
Remove m_copy completely.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.80 19-Feb-2018 christos

branches: 1.80.2;
It is normal for socket credentials to be missing for incoming sockets,
so don't warn.


# 1.79 18-Feb-2018 christos

PR/53036: Alexander Nasonov: 'block user' in pf's ruleset panics 8.0_BETA
Check for NULL.


# 1.78 09-Feb-2018 maxv

Oh, what is this. Fix a remotely-triggerable integer overflow: the way we
define TCPOLEN_SACK makes it unsigned, and the comparison in the while()
is unsigned too. That's not the expected behavior, the original code
wanted a signed comparison.

It's pretty easy to make 'hlen' go negative and trigger a buffer overflow.

This bug was reported 8 years ago by Lucio Albornoz in PR/44059.


Revision tags: tls-maxphys-base-20171202
# 1.77 31-Oct-2017 christos

PR/52682: David Binderman: Fix wrong assignment (in the !__NetBSD__ code)


Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.76 14-Feb-2017 ozaki-r

branches: 1.76.6;
Do ND in L2_output in the same manner as arpresolve

The benefits of this change are:
- The flow is consistent with IPv4 (and FreeBSD and OpenBSD)
- old: ip6_output => nd6_output (do ND if needed) => L2_output (lookup a stored cache)
- new: ip6_output => L2_output (lookup a cache. Do ND if cache not found)
- We can remove some workarounds in nd6_output
- We can move L2 specific operations to their own place
- The performance slightly improves because one cache lookup is reduced


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.75 08-Dec-2016 ozaki-r

branches: 1.75.2;
Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.74 20-Jun-2016 knakahara

branches: 1.74.2;
apply if_output_lock() to L3 callers which call ifp->if_output() of L2(or L3 tunneling).


# 1.73 10-Jun-2016 ozaki-r

Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.


Revision tags: netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base nick-nhusb-base-20160529 netbsd-7-0-1-RELEASE nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.72 25-Jul-2014 ozaki-r

branches: 1.72.2; 1.72.4; 1.72.6; 1.72.10;
Unbreak the build of pf


# 1.71 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base
# 1.70 20-Oct-2013 christos

branches: 1.70.2;
fix compiler warnings


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6 jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.69 22-Mar-2012 drochner

branches: 1.69.2; 1.69.4;
remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.68 19-Dec-2011 drochner

branches: 1.68.2; 1.68.6; 1.68.8;
do missing ipsec->kame_ipsec renames


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base
# 1.67 19-Nov-2011 tls

branches: 1.67.2;
First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.


Revision tags: jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.66 29-Aug-2011 jmcneill

branches: 1.66.2;
build pf module with WARNS=3, and remove the need for -Wno-shadow


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.65 03-May-2011 dyoung

Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.64 07-May-2010 degroote

branches: 1.64.2;
Add support for pfs(8)

pfs(8) is a tool similar to ipfs(8) but for pf(4). It allows the admin to
dump internal configuration of pf, and restore at a latter point, after a
maintenance reboot for example, in a transparent way for user.

This work has been done mostly during my GSoC 2009

No objections on tech-net@


Revision tags: uebayasi-xip-base1
# 1.63 12-Apr-2010 ahoka

- Make the pf and pflog driver able to detach.
- Add code for module support.

Original patch from Jared McNeill


# 1.62 12-Apr-2010 skrll

Spello in comment.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base
# 1.61 19-Jan-2010 pooka

branches: 1.61.2; 1.61.4;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.


# 1.60 30-Dec-2009 elad

Replace uidinfo.h with kauth.h, should fix problems observed by tron@.


# 1.59 30-Dec-2009 elad

Use the right member to store gid in the non-NetBSD case.

Pointed out by uebayasi@ and cegger@, thanks!


# 1.58 30-Dec-2009 elad

Get uid/gid from the socket's credentials.


Revision tags: matt-premerge-20091211 yamt-nfs-mp-base8 jym-xensuspend-nbase
# 1.57 14-Sep-2009 degroote

Import pfsync support from OpenBSD 4.2

Pfsync interface exposes change in the pf(4) over a pseudo-interface, and can
be used to synchronise different pf.

This work was part of my 2009 GSoC

No objection on tech-net@


Revision tags: yamt-nfs-mp-base7
# 1.56 28-Jul-2009 minskim

Remove LKM code from pf.


Revision tags: jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.55 16-Jun-2009 minskim

Reduce diff with OpenBSD. No functional change.


Revision tags: yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.54 13-Apr-2009 christos

Fix http://www.securityfocus.com/archive/1/502634, from OpenBSD.
XXX: should be pulled up to 5.x


Revision tags: netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.53 11-Oct-2008 pooka

branches: 1.53.2; 1.53.4; 1.53.8;
Move uidinfo to its own module in kern_uidinfo.c and include in rump.
No functional change to uidinfo.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.52 18-Jun-2008 yamt

branches: 1.52.2;
merge yamt-pf42 branch.
(import newer pf from OpenBSD 4.2)

ok'ed by peter@. requested by core@


Revision tags: yamt-pf42-base4 yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 yamt-nfs-mp-base yamt-pf42-base
# 1.51 15-Apr-2008 thorpej

branches: 1.51.2; 1.51.4; 1.51.6; 1.51.8;
Make ip6 and icmp6 stats per-cpu.


# 1.50 12-Apr-2008 thorpej

Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.49 08-Apr-2008 thorpej

Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.


# 1.48 08-Apr-2008 thorpej

Change TCP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old tcpstat structure; old netstat
binaries will continue to work properly.


# 1.47 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.46 06-Apr-2008 thorpej

Change UDP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmpstat structure; old netstat
binaries will continue to work properly.


# 1.45 06-Apr-2008 thorpej

Change ICMP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmpstat structure; old netstat
binaries will continue to work properly.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.44 14-Jan-2008 dyoung

branches: 1.44.6;
Change rtcache_init()+rtcache_getrt() and
rtcache_init_noclone()+rtcache_getrt() to single rtcache_init()
and rtcache_init_clone() calls.


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.43 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.42 11-Dec-2007 lukem

use __KERNEL_RCSID()


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.41 28-Nov-2007 dyoung

branches: 1.41.2; 1.41.4; 1.41.6;
Bug fix: make pf_route() set M_CSUM_IPV4 before calling ip_fragment().

If you use a route-to rule such as 'pass out quick on ath0 route-to
gre2 all', and the MTU on gre2 is smaller than the MTU on ath0,
then pf_route() will fragment your packet by calling ip_fragment().
Because pf_route() did not set M_CSUM_IPv4, ip_fragment() would
not compute the checksum on the fragments, and PF would send IP
fragments with bad checksums out of gre2.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base vmlocking-base
# 1.40 07-Aug-2007 yamt

branches: 1.40.2; 1.40.8;
reduce diff.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.39 17-May-2007 christos

branches: 1.39.2; 1.39.6;
Coverity CID 3157: remove bogus break.


Revision tags: yamt-idlelwp-base8
# 1.38 10-May-2007 dyoung

pfctl: extend pf.conf(5) syntax. Let the operator supply an optional
"state lock" flag (if-bound, gr-bound, floating) at the end of a
NAT rule. The new syntax is backwards-compatbile with the old
syntax.

PF (kernel): change the macro BOUND_IFACE() to the inline function
bound_iface(), and add a new argument, the applicable NAT rule.
Use both the flags on the applicable filter rule and on the applicable
NAT rule to decide whether or not to bind a state to the interface
or the group where it is created.


# 1.37 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.36 04-Mar-2007 christos

branches: 1.36.2; 1.36.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.35 17-Feb-2007 dyoung

In pf_rtlabel_match, use rtcache_free()/rtcache_init(). This is
just cosmetic, since the whole routine is presently #if 0'd.


Revision tags: post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 newlock2-base
# 1.34 15-Dec-2006 joerg

branches: 1.34.2;
Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


# 1.33 13-Dec-2006 matt

Don't apply a window scale to the window size in a SYN packet.


Revision tags: yamt-splraiseipl-base3
# 1.32 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.31 04-Dec-2006 dyoung

Indent these macros for readability. People have to read this
code, too.


# 1.30 04-Dec-2006 dyoung

Lightly constify. Helps compile-time checking that we are not
scribbling over shared or read-only memory---e.g., in mbufs.


# 1.29 04-Dec-2006 dyoung

No need for a struct route_in6 in pf_route6(). Replace it with a
sockaddr_in6.

In pf_calc_mss(), factor common code out of PF_INET and PF_INET6
switch cases.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base netbsd-4-base
# 1.28 16-Nov-2006 christos

branches: 1.28.2; 1.28.8;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.27 12-Oct-2006 peter

Merge the peter-altq branch.

(sync with KAME & add support for using ALTQ with pf(4)).


# 1.26 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.25 07-Oct-2006 peter

PR/34746: Nino Dehne: pf(4)'s synproxy state breaks when used with tags

Apply OpenBSD src/sys/net/pf.c rev 1.486 and 1.487:

1.486:
When synproxy sends packets to the destination host, make sure to copy
the 'tag' from the original state entry into the outgoing mbuf.

1.487:
When synproxy completes the replayed handshake and modifies the state
into a normal one, it sets both peers' sequence windows. Fix a bug where
the previously advertised windows are applied to the wrong side (i.e.
peer A's seqhi is peer A's seqlo plus peer B's, not A's, window). This
went undetected because mostly the windows are similar and/or re-
advertised soon. But there are (rare) cases where a synproxy'd connection
would stall right after handshake. Found by Gleb Smirnoff.


# 1.24 01-Oct-2006 pavel

In pf, there are lots of #ifdef ALTQ, but our ALTQ is not what pf expects,
and if ALTQ and pf are both enabled, it leads to compile errors. So,
change all tests for ALTQ to ALTQ_NEW, which won't be defined.

This allows simultaneous compilation of pf and ALTQ and is a temporary
measure before the peter-altq brach is merged.

Tested and approved by Peter Postma.


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base rpaulo-netinet-merge-pcb-base
# 1.23 14-May-2006 christos

branches: 1.23.8; 1.23.10;
XXX: GCC uninitialized


Revision tags: elad-kernelauth-base
# 1.22 11-May-2006 mrg

quell GCC 4.1 uninitialised variable warnings.

XXX: we should audit the tree for which old ones are no longer needed
after getting the older compilers out of the tree..


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.21 19-Feb-2006 peter

branches: 1.21.2; 1.21.4; 1.21.6;
Fix TCP/UDP checksum handling as pointed out by Daniel Hartmeier in:
http://mail-index.netbsd.org/tech-net/2006/01/21/0000.html.

Problem reported and patch tested by der Mouse & Nino Dehne (PR/32874).


# 1.20 07-Feb-2006 rpaulo

In pf_socket_lookup() fix copy & paste problem when in6_pcblookup_bind()
returns NULL.


# 1.19 11-Dec-2005 christos

branches: 1.19.2; 1.19.4; 1.19.6;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base ktrace-lwp-base
# 1.18 23-Oct-2005 christos

Adjust for icmp_error signature.


Revision tags: yamt-vop-base
# 1.17 01-Jul-2005 peter

branches: 1.17.2; 1.17.4;
Resolve conflicts (pf from OpenBSD 3.7, kernel part).


# 1.16 15-Jun-2005 lukem

Use an "XXXGCC -Wuninitalized" style that is consistent with that used
elsewhere in the tree.


# 1.15 14-Jun-2005 jmc

Cleanup XXGCC in a few places to make it easier to see.


# 1.14 13-Jun-2005 jmc

Fix unitialized warnings that only crop up on m68k. XXGCC taggedd


# 1.13 07-May-2005 christos

more fallout from so_uid -> so_uidinfo.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.12 14-Feb-2005 peter

branches: 1.12.4;
Merge in a fix from OPENBSD_3_6.
ok yamt@

> MFC:
> Fix by dhartmei@
>
> ICMP state entries use the ICMP ID as port for the unique state key. When
> checking for a usable key, construct the key in the same way. Otherwise,
> a colliding key might be missed or a state insertion might be refused even
> though it could be inserted. The second case triggers the endless loop
> fixed by 1.474, possibly allowing a NATed LAN client to lock up the kernel.
> Report and test data by Srebrenko Sehic.


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.11 21-Dec-2004 peter

branches: 1.11.2; 1.11.4;
Apply a patch from OPENBSD_3_6 branch (ok yamt).

MFC:
Fix by dhartmei@

IPv6 packets can contain headers (like options) before the TCP/UDP/ICMP6
header. pf finds the first TCP/UDP/ICMP6 header to filter by traversing
the header chain. In the case where headers are skipped, the protocol
checksum verification used the wrong length (included the skipped headers),
leading to incorrectly mismatching checksums. Such IPv6 packets with
headers were silently dropped. Reported by Bernhard Schmidt.

ok deraadt@ dhartmei@ mcbride@


# 1.10 21-Dec-2004 peter

Apply a patch from OPENBSD_3_6 branch (ok yamt).

MFC:
Fix by mcbride@

Initialise init_addr in pf_map_addr() in the PF_POOL_ROUNDROBIN,
prevents a possible endless loop in pf_get_sport() with 'static-port'

Reported by adm at celeritystorm dot com in FreeBSD PR74930, debugging
by dhartmei@

ok mcbride@ dhartmei@ deraadt@ henning@


# 1.9 21-Dec-2004 yamt

pf_check_proto_cksum: use {tcp,udp}_input_checksum so that we can:
- handle loopback checksum omission properly.
- profit from h/w checksum offloading.


Revision tags: kent-audio1-base
# 1.8 05-Dec-2004 peter

Apply a patch from OpenBSD 3.6 branch (ok yamt@).

MFC:
Fix by dhartmei@

fix a bug that leads to a crash when binat rules of the form
'binat from ... to ... -> (if)' are used, where the interface
is dynamic. reported by kos(at)bastard(dot)net, analyzed by
Pyun YongHyeon.


# 1.7 21-Nov-2004 peter

Apply a patch from the OPENBSD_3_6 branch, ok itojun.

MFC:
Fix by dhartmei@

The flag to re-filter pf-generated packets was set wrong by synproxy
for ACKs. It should filter the ACK replayed to the server, instead of
of the one to the client.


# 1.6 21-Nov-2004 peter

Apply a patch from the OPENBSD_3_6 branch, ok itojun.

MFC:
Fix by dhartmei@

For RST generated due to state mismatch during handshake, don't set
th_flags TH_ACK and leave th_ack 0, just like the RST generated by
the stack in this case. Fixes the Raptor workaround.


# 1.5 14-Nov-2004 yamt

resolve conflicts. (pf from OpenBSD 3.6, kernel part)


# 1.4 08-Sep-2004 yamt

remove no longer needed caddr_t casts to reduce diffs from openbsd.


# 1.3 22-Jun-2004 martin

branches: 1.3.2;
Fix formatting for 64 bit archs. This fixes PR port-sparc64/26010.
While there, make it compile for non-INET6 aware kernels.


# 1.2 22-Jun-2004 itojun

PF from openbsd 3.5. missing features:
- pfsync (due to protocol # assignment issues)
- carp (not really a PF portion, but thought important to mention)
- PF and ALTQ are mutually-exclusive. this will be sorted out when
kjc@csl.sony.co.jp updates ALTQ and PF (and API inbetween)

reviewed by matt, christos, perry

torture-test is very welcomed.


# 1.1 22-Jun-2004 itojun

branches: 1.1.1;
Initial revision


# 1.84 10-Aug-2020 rin

Clean up _LKM --> _MODULE leftovers.

Note that _KERNEL is always defined for modules.


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 netbsd-9-0-RELEASE netbsd-9-0-RC2 ad-namecache-base2 ad-namecache-base1 ad-namecache-base netbsd-9-0-RC1 phil-wifi-20191119 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.83 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.82 11-Jul-2018 maxv

Rename

ip_undefer_csum -> in_undefer_cksum
in_delayed_cksum -> in_undefer_cksum_tcpudp

The two previous names were inconsistent and misleading.

Put the two functions into in_offload.c. Add comments to explain what
we're doing.

The same could be done for IPv6.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.81 03-May-2018 maxv

branches: 1.81.2;
Remove m_copy completely.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.80 19-Feb-2018 christos

branches: 1.80.2;
It is normal for socket credentials to be missing for incoming sockets,
so don't warn.


# 1.79 18-Feb-2018 christos

PR/53036: Alexander Nasonov: 'block user' in pf's ruleset panics 8.0_BETA
Check for NULL.


# 1.78 09-Feb-2018 maxv

Oh, what is this. Fix a remotely-triggerable integer overflow: the way we
define TCPOLEN_SACK makes it unsigned, and the comparison in the while()
is unsigned too. That's not the expected behavior, the original code
wanted a signed comparison.

It's pretty easy to make 'hlen' go negative and trigger a buffer overflow.

This bug was reported 8 years ago by Lucio Albornoz in PR/44059.


Revision tags: tls-maxphys-base-20171202
# 1.77 31-Oct-2017 christos

PR/52682: David Binderman: Fix wrong assignment (in the !__NetBSD__ code)


Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.76 14-Feb-2017 ozaki-r

branches: 1.76.6;
Do ND in L2_output in the same manner as arpresolve

The benefits of this change are:
- The flow is consistent with IPv4 (and FreeBSD and OpenBSD)
- old: ip6_output => nd6_output (do ND if needed) => L2_output (lookup a stored cache)
- new: ip6_output => L2_output (lookup a cache. Do ND if cache not found)
- We can remove some workarounds in nd6_output
- We can move L2 specific operations to their own place
- The performance slightly improves because one cache lookup is reduced


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.75 08-Dec-2016 ozaki-r

branches: 1.75.2;
Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.74 20-Jun-2016 knakahara

branches: 1.74.2;
apply if_output_lock() to L3 callers which call ifp->if_output() of L2(or L3 tunneling).


# 1.73 10-Jun-2016 ozaki-r

Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.


Revision tags: netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base nick-nhusb-base-20160529 netbsd-7-0-1-RELEASE nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.72 25-Jul-2014 ozaki-r

branches: 1.72.2; 1.72.4; 1.72.6; 1.72.10;
Unbreak the build of pf


# 1.71 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base
# 1.70 20-Oct-2013 christos

branches: 1.70.2;
fix compiler warnings


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6 jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.69 22-Mar-2012 drochner

branches: 1.69.2; 1.69.4;
remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.68 19-Dec-2011 drochner

branches: 1.68.2; 1.68.6; 1.68.8;
do missing ipsec->kame_ipsec renames


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base
# 1.67 19-Nov-2011 tls

branches: 1.67.2;
First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.


Revision tags: jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.66 29-Aug-2011 jmcneill

branches: 1.66.2;
build pf module with WARNS=3, and remove the need for -Wno-shadow


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.65 03-May-2011 dyoung

Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.64 07-May-2010 degroote

branches: 1.64.2;
Add support for pfs(8)

pfs(8) is a tool similar to ipfs(8) but for pf(4). It allows the admin to
dump internal configuration of pf, and restore at a latter point, after a
maintenance reboot for example, in a transparent way for user.

This work has been done mostly during my GSoC 2009

No objections on tech-net@


Revision tags: uebayasi-xip-base1
# 1.63 12-Apr-2010 ahoka

- Make the pf and pflog driver able to detach.
- Add code for module support.

Original patch from Jared McNeill


# 1.62 12-Apr-2010 skrll

Spello in comment.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base
# 1.61 19-Jan-2010 pooka

branches: 1.61.2; 1.61.4;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.


# 1.60 30-Dec-2009 elad

Replace uidinfo.h with kauth.h, should fix problems observed by tron@.


# 1.59 30-Dec-2009 elad

Use the right member to store gid in the non-NetBSD case.

Pointed out by uebayasi@ and cegger@, thanks!


# 1.58 30-Dec-2009 elad

Get uid/gid from the socket's credentials.


Revision tags: matt-premerge-20091211 yamt-nfs-mp-base8 jym-xensuspend-nbase
# 1.57 14-Sep-2009 degroote

Import pfsync support from OpenBSD 4.2

Pfsync interface exposes change in the pf(4) over a pseudo-interface, and can
be used to synchronise different pf.

This work was part of my 2009 GSoC

No objection on tech-net@


Revision tags: yamt-nfs-mp-base7
# 1.56 28-Jul-2009 minskim

Remove LKM code from pf.


Revision tags: jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.55 16-Jun-2009 minskim

Reduce diff with OpenBSD. No functional change.


Revision tags: yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.54 13-Apr-2009 christos

Fix http://www.securityfocus.com/archive/1/502634, from OpenBSD.
XXX: should be pulled up to 5.x


Revision tags: netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.53 11-Oct-2008 pooka

branches: 1.53.2; 1.53.4; 1.53.8;
Move uidinfo to its own module in kern_uidinfo.c and include in rump.
No functional change to uidinfo.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.52 18-Jun-2008 yamt

branches: 1.52.2;
merge yamt-pf42 branch.
(import newer pf from OpenBSD 4.2)

ok'ed by peter@. requested by core@


Revision tags: yamt-pf42-base4 yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 yamt-nfs-mp-base yamt-pf42-base
# 1.51 15-Apr-2008 thorpej

branches: 1.51.2; 1.51.4; 1.51.6; 1.51.8;
Make ip6 and icmp6 stats per-cpu.


# 1.50 12-Apr-2008 thorpej

Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.49 08-Apr-2008 thorpej

Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.


# 1.48 08-Apr-2008 thorpej

Change TCP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old tcpstat structure; old netstat
binaries will continue to work properly.


# 1.47 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.46 06-Apr-2008 thorpej

Change UDP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmpstat structure; old netstat
binaries will continue to work properly.


# 1.45 06-Apr-2008 thorpej

Change ICMP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmpstat structure; old netstat
binaries will continue to work properly.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.44 14-Jan-2008 dyoung

branches: 1.44.6;
Change rtcache_init()+rtcache_getrt() and
rtcache_init_noclone()+rtcache_getrt() to single rtcache_init()
and rtcache_init_clone() calls.


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.43 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.42 11-Dec-2007 lukem

use __KERNEL_RCSID()


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.41 28-Nov-2007 dyoung

branches: 1.41.2; 1.41.4; 1.41.6;
Bug fix: make pf_route() set M_CSUM_IPV4 before calling ip_fragment().

If you use a route-to rule such as 'pass out quick on ath0 route-to
gre2 all', and the MTU on gre2 is smaller than the MTU on ath0,
then pf_route() will fragment your packet by calling ip_fragment().
Because pf_route() did not set M_CSUM_IPv4, ip_fragment() would
not compute the checksum on the fragments, and PF would send IP
fragments with bad checksums out of gre2.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base vmlocking-base
# 1.40 07-Aug-2007 yamt

branches: 1.40.2; 1.40.8;
reduce diff.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.39 17-May-2007 christos

branches: 1.39.2; 1.39.6;
Coverity CID 3157: remove bogus break.


Revision tags: yamt-idlelwp-base8
# 1.38 10-May-2007 dyoung

pfctl: extend pf.conf(5) syntax. Let the operator supply an optional
"state lock" flag (if-bound, gr-bound, floating) at the end of a
NAT rule. The new syntax is backwards-compatbile with the old
syntax.

PF (kernel): change the macro BOUND_IFACE() to the inline function
bound_iface(), and add a new argument, the applicable NAT rule.
Use both the flags on the applicable filter rule and on the applicable
NAT rule to decide whether or not to bind a state to the interface
or the group where it is created.


# 1.37 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.36 04-Mar-2007 christos

branches: 1.36.2; 1.36.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.35 17-Feb-2007 dyoung

In pf_rtlabel_match, use rtcache_free()/rtcache_init(). This is
just cosmetic, since the whole routine is presently #if 0'd.


Revision tags: post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 newlock2-base
# 1.34 15-Dec-2006 joerg

branches: 1.34.2;
Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


# 1.33 13-Dec-2006 matt

Don't apply a window scale to the window size in a SYN packet.


Revision tags: yamt-splraiseipl-base3
# 1.32 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.31 04-Dec-2006 dyoung

Indent these macros for readability. People have to read this
code, too.


# 1.30 04-Dec-2006 dyoung

Lightly constify. Helps compile-time checking that we are not
scribbling over shared or read-only memory---e.g., in mbufs.


# 1.29 04-Dec-2006 dyoung

No need for a struct route_in6 in pf_route6(). Replace it with a
sockaddr_in6.

In pf_calc_mss(), factor common code out of PF_INET and PF_INET6
switch cases.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base netbsd-4-base
# 1.28 16-Nov-2006 christos

branches: 1.28.2; 1.28.8;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.27 12-Oct-2006 peter

Merge the peter-altq branch.

(sync with KAME & add support for using ALTQ with pf(4)).


# 1.26 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.25 07-Oct-2006 peter

PR/34746: Nino Dehne: pf(4)'s synproxy state breaks when used with tags

Apply OpenBSD src/sys/net/pf.c rev 1.486 and 1.487:

1.486:
When synproxy sends packets to the destination host, make sure to copy
the 'tag' from the original state entry into the outgoing mbuf.

1.487:
When synproxy completes the replayed handshake and modifies the state
into a normal one, it sets both peers' sequence windows. Fix a bug where
the previously advertised windows are applied to the wrong side (i.e.
peer A's seqhi is peer A's seqlo plus peer B's, not A's, window). This
went undetected because mostly the windows are similar and/or re-
advertised soon. But there are (rare) cases where a synproxy'd connection
would stall right after handshake. Found by Gleb Smirnoff.


# 1.24 01-Oct-2006 pavel

In pf, there are lots of #ifdef ALTQ, but our ALTQ is not what pf expects,
and if ALTQ and pf are both enabled, it leads to compile errors. So,
change all tests for ALTQ to ALTQ_NEW, which won't be defined.

This allows simultaneous compilation of pf and ALTQ and is a temporary
measure before the peter-altq brach is merged.

Tested and approved by Peter Postma.


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base rpaulo-netinet-merge-pcb-base
# 1.23 14-May-2006 christos

branches: 1.23.8; 1.23.10;
XXX: GCC uninitialized


Revision tags: elad-kernelauth-base
# 1.22 11-May-2006 mrg

quell GCC 4.1 uninitialised variable warnings.

XXX: we should audit the tree for which old ones are no longer needed
after getting the older compilers out of the tree..


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.21 19-Feb-2006 peter

branches: 1.21.2; 1.21.4; 1.21.6;
Fix TCP/UDP checksum handling as pointed out by Daniel Hartmeier in:
http://mail-index.netbsd.org/tech-net/2006/01/21/0000.html.

Problem reported and patch tested by der Mouse & Nino Dehne (PR/32874).


# 1.20 07-Feb-2006 rpaulo

In pf_socket_lookup() fix copy & paste problem when in6_pcblookup_bind()
returns NULL.


# 1.19 11-Dec-2005 christos

branches: 1.19.2; 1.19.4; 1.19.6;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base ktrace-lwp-base
# 1.18 23-Oct-2005 christos

Adjust for icmp_error signature.


Revision tags: yamt-vop-base
# 1.17 01-Jul-2005 peter

branches: 1.17.2; 1.17.4;
Resolve conflicts (pf from OpenBSD 3.7, kernel part).


# 1.16 15-Jun-2005 lukem

Use an "XXXGCC -Wuninitalized" style that is consistent with that used
elsewhere in the tree.


# 1.15 14-Jun-2005 jmc

Cleanup XXGCC in a few places to make it easier to see.


# 1.14 13-Jun-2005 jmc

Fix unitialized warnings that only crop up on m68k. XXGCC taggedd


# 1.13 07-May-2005 christos

more fallout from so_uid -> so_uidinfo.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.12 14-Feb-2005 peter

branches: 1.12.4;
Merge in a fix from OPENBSD_3_6.
ok yamt@

> MFC:
> Fix by dhartmei@
>
> ICMP state entries use the ICMP ID as port for the unique state key. When
> checking for a usable key, construct the key in the same way. Otherwise,
> a colliding key might be missed or a state insertion might be refused even
> though it could be inserted. The second case triggers the endless loop
> fixed by 1.474, possibly allowing a NATed LAN client to lock up the kernel.
> Report and test data by Srebrenko Sehic.


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.11 21-Dec-2004 peter

branches: 1.11.2; 1.11.4;
Apply a patch from OPENBSD_3_6 branch (ok yamt).

MFC:
Fix by dhartmei@

IPv6 packets can contain headers (like options) before the TCP/UDP/ICMP6
header. pf finds the first TCP/UDP/ICMP6 header to filter by traversing
the header chain. In the case where headers are skipped, the protocol
checksum verification used the wrong length (included the skipped headers),
leading to incorrectly mismatching checksums. Such IPv6 packets with
headers were silently dropped. Reported by Bernhard Schmidt.

ok deraadt@ dhartmei@ mcbride@


# 1.10 21-Dec-2004 peter

Apply a patch from OPENBSD_3_6 branch (ok yamt).

MFC:
Fix by mcbride@

Initialise init_addr in pf_map_addr() in the PF_POOL_ROUNDROBIN,
prevents a possible endless loop in pf_get_sport() with 'static-port'

Reported by adm at celeritystorm dot com in FreeBSD PR74930, debugging
by dhartmei@

ok mcbride@ dhartmei@ deraadt@ henning@


# 1.9 21-Dec-2004 yamt

pf_check_proto_cksum: use {tcp,udp}_input_checksum so that we can:
- handle loopback checksum omission properly.
- profit from h/w checksum offloading.


Revision tags: kent-audio1-base
# 1.8 05-Dec-2004 peter

Apply a patch from OpenBSD 3.6 branch (ok yamt@).

MFC:
Fix by dhartmei@

fix a bug that leads to a crash when binat rules of the form
'binat from ... to ... -> (if)' are used, where the interface
is dynamic. reported by kos(at)bastard(dot)net, analyzed by
Pyun YongHyeon.


# 1.7 21-Nov-2004 peter

Apply a patch from the OPENBSD_3_6 branch, ok itojun.

MFC:
Fix by dhartmei@

The flag to re-filter pf-generated packets was set wrong by synproxy
for ACKs. It should filter the ACK replayed to the server, instead of
of the one to the client.


# 1.6 21-Nov-2004 peter

Apply a patch from the OPENBSD_3_6 branch, ok itojun.

MFC:
Fix by dhartmei@

For RST generated due to state mismatch during handshake, don't set
th_flags TH_ACK and leave th_ack 0, just like the RST generated by
the stack in this case. Fixes the Raptor workaround.


# 1.5 14-Nov-2004 yamt

resolve conflicts. (pf from OpenBSD 3.6, kernel part)


# 1.4 08-Sep-2004 yamt

remove no longer needed caddr_t casts to reduce diffs from openbsd.


# 1.3 22-Jun-2004 martin

branches: 1.3.2;
Fix formatting for 64 bit archs. This fixes PR port-sparc64/26010.
While there, make it compile for non-INET6 aware kernels.


# 1.2 22-Jun-2004 itojun

PF from openbsd 3.5. missing features:
- pfsync (due to protocol # assignment issues)
- carp (not really a PF portion, but thought important to mention)
- PF and ALTQ are mutually-exclusive. this will be sorted out when
kjc@csl.sony.co.jp updates ALTQ and PF (and API inbetween)

reviewed by matt, christos, perry

torture-test is very welcomed.


# 1.1 22-Jun-2004 itojun

branches: 1.1.1;
Initial revision


Revision tags: isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126 pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.83 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728
# 1.82 11-Jul-2018 maxv

Rename

ip_undefer_csum -> in_undefer_cksum
in_delayed_cksum -> in_undefer_cksum_tcpudp

The two previous names were inconsistent and misleading.

Put the two functions into in_offload.c. Add comments to explain what
we're doing.

The same could be done for IPv6.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.81 03-May-2018 maxv

Remove m_copy completely.


Revision tags: pgoyette-compat-0502 pgoyette-compat-0422 pgoyette-compat-0415 pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.80 19-Feb-2018 christos

branches: 1.80.2;
It is normal for socket credentials to be missing for incoming sockets,
so don't warn.


# 1.79 18-Feb-2018 christos

PR/53036: Alexander Nasonov: 'block user' in pf's ruleset panics 8.0_BETA
Check for NULL.


# 1.78 09-Feb-2018 maxv

Oh, what is this. Fix a remotely-triggerable integer overflow: the way we
define TCPOLEN_SACK makes it unsigned, and the comparison in the while()
is unsigned too. That's not the expected behavior, the original code
wanted a signed comparison.

It's pretty easy to make 'hlen' go negative and trigger a buffer overflow.

This bug was reported 8 years ago by Lucio Albornoz in PR/44059.


Revision tags: tls-maxphys-base-20171202
# 1.77 31-Oct-2017 christos

PR/52682: David Binderman: Fix wrong assignment (in the !__NetBSD__ code)


Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.76 14-Feb-2017 ozaki-r

branches: 1.76.6;
Do ND in L2_output in the same manner as arpresolve

The benefits of this change are:
- The flow is consistent with IPv4 (and FreeBSD and OpenBSD)
- old: ip6_output => nd6_output (do ND if needed) => L2_output (lookup a stored cache)
- new: ip6_output => L2_output (lookup a cache. Do ND if cache not found)
- We can remove some workarounds in nd6_output
- We can move L2 specific operations to their own place
- The performance slightly improves because one cache lookup is reduced


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.75 08-Dec-2016 ozaki-r

branches: 1.75.2;
Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.74 20-Jun-2016 knakahara

branches: 1.74.2;
apply if_output_lock() to L3 callers which call ifp->if_output() of L2(or L3 tunneling).


# 1.73 10-Jun-2016 ozaki-r

Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.


Revision tags: netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base nick-nhusb-base-20160529 netbsd-7-0-1-RELEASE nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.72 25-Jul-2014 ozaki-r

branches: 1.72.2; 1.72.4; 1.72.6; 1.72.10;
Unbreak the build of pf


# 1.71 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base
# 1.70 20-Oct-2013 christos

branches: 1.70.2;
fix compiler warnings


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6 jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.69 22-Mar-2012 drochner

branches: 1.69.2; 1.69.4;
remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.68 19-Dec-2011 drochner

branches: 1.68.2; 1.68.6; 1.68.8;
do missing ipsec->kame_ipsec renames


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base
# 1.67 19-Nov-2011 tls

branches: 1.67.2;
First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.


Revision tags: jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.66 29-Aug-2011 jmcneill

branches: 1.66.2;
build pf module with WARNS=3, and remove the need for -Wno-shadow


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.65 03-May-2011 dyoung

Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.64 07-May-2010 degroote

branches: 1.64.2;
Add support for pfs(8)

pfs(8) is a tool similar to ipfs(8) but for pf(4). It allows the admin to
dump internal configuration of pf, and restore at a latter point, after a
maintenance reboot for example, in a transparent way for user.

This work has been done mostly during my GSoC 2009

No objections on tech-net@


Revision tags: uebayasi-xip-base1
# 1.63 12-Apr-2010 ahoka

- Make the pf and pflog driver able to detach.
- Add code for module support.

Original patch from Jared McNeill


# 1.62 12-Apr-2010 skrll

Spello in comment.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base
# 1.61 19-Jan-2010 pooka

branches: 1.61.2; 1.61.4;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.


# 1.60 30-Dec-2009 elad

Replace uidinfo.h with kauth.h, should fix problems observed by tron@.


# 1.59 30-Dec-2009 elad

Use the right member to store gid in the non-NetBSD case.

Pointed out by uebayasi@ and cegger@, thanks!


# 1.58 30-Dec-2009 elad

Get uid/gid from the socket's credentials.


Revision tags: matt-premerge-20091211 yamt-nfs-mp-base8 jym-xensuspend-nbase
# 1.57 14-Sep-2009 degroote

Import pfsync support from OpenBSD 4.2

Pfsync interface exposes change in the pf(4) over a pseudo-interface, and can
be used to synchronise different pf.

This work was part of my 2009 GSoC

No objection on tech-net@


Revision tags: yamt-nfs-mp-base7
# 1.56 28-Jul-2009 minskim

Remove LKM code from pf.


Revision tags: jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.55 16-Jun-2009 minskim

Reduce diff with OpenBSD. No functional change.


Revision tags: yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.54 13-Apr-2009 christos

Fix http://www.securityfocus.com/archive/1/502634, from OpenBSD.
XXX: should be pulled up to 5.x


Revision tags: netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.53 11-Oct-2008 pooka

branches: 1.53.2; 1.53.4; 1.53.8;
Move uidinfo to its own module in kern_uidinfo.c and include in rump.
No functional change to uidinfo.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.52 18-Jun-2008 yamt

branches: 1.52.2;
merge yamt-pf42 branch.
(import newer pf from OpenBSD 4.2)

ok'ed by peter@. requested by core@


Revision tags: yamt-pf42-base4 yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 yamt-nfs-mp-base yamt-pf42-base
# 1.51 15-Apr-2008 thorpej

branches: 1.51.2; 1.51.4; 1.51.6; 1.51.8;
Make ip6 and icmp6 stats per-cpu.


# 1.50 12-Apr-2008 thorpej

Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.49 08-Apr-2008 thorpej

Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.


# 1.48 08-Apr-2008 thorpej

Change TCP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old tcpstat structure; old netstat
binaries will continue to work properly.


# 1.47 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.46 06-Apr-2008 thorpej

Change UDP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmpstat structure; old netstat
binaries will continue to work properly.


# 1.45 06-Apr-2008 thorpej

Change ICMP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmpstat structure; old netstat
binaries will continue to work properly.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.44 14-Jan-2008 dyoung

branches: 1.44.6;
Change rtcache_init()+rtcache_getrt() and
rtcache_init_noclone()+rtcache_getrt() to single rtcache_init()
and rtcache_init_clone() calls.


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.43 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.42 11-Dec-2007 lukem

use __KERNEL_RCSID()


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.41 28-Nov-2007 dyoung

branches: 1.41.2; 1.41.4; 1.41.6;
Bug fix: make pf_route() set M_CSUM_IPV4 before calling ip_fragment().

If you use a route-to rule such as 'pass out quick on ath0 route-to
gre2 all', and the MTU on gre2 is smaller than the MTU on ath0,
then pf_route() will fragment your packet by calling ip_fragment().
Because pf_route() did not set M_CSUM_IPv4, ip_fragment() would
not compute the checksum on the fragments, and PF would send IP
fragments with bad checksums out of gre2.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base vmlocking-base
# 1.40 07-Aug-2007 yamt

branches: 1.40.2; 1.40.8;
reduce diff.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.39 17-May-2007 christos

branches: 1.39.2; 1.39.6;
Coverity CID 3157: remove bogus break.


Revision tags: yamt-idlelwp-base8
# 1.38 10-May-2007 dyoung

pfctl: extend pf.conf(5) syntax. Let the operator supply an optional
"state lock" flag (if-bound, gr-bound, floating) at the end of a
NAT rule. The new syntax is backwards-compatbile with the old
syntax.

PF (kernel): change the macro BOUND_IFACE() to the inline function
bound_iface(), and add a new argument, the applicable NAT rule.
Use both the flags on the applicable filter rule and on the applicable
NAT rule to decide whether or not to bind a state to the interface
or the group where it is created.


# 1.37 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.36 04-Mar-2007 christos

branches: 1.36.2; 1.36.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.35 17-Feb-2007 dyoung

In pf_rtlabel_match, use rtcache_free()/rtcache_init(). This is
just cosmetic, since the whole routine is presently #if 0'd.


Revision tags: post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 newlock2-base
# 1.34 15-Dec-2006 joerg

branches: 1.34.2;
Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


# 1.33 13-Dec-2006 matt

Don't apply a window scale to the window size in a SYN packet.


Revision tags: yamt-splraiseipl-base3
# 1.32 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.31 04-Dec-2006 dyoung

Indent these macros for readability. People have to read this
code, too.


# 1.30 04-Dec-2006 dyoung

Lightly constify. Helps compile-time checking that we are not
scribbling over shared or read-only memory---e.g., in mbufs.


# 1.29 04-Dec-2006 dyoung

No need for a struct route_in6 in pf_route6(). Replace it with a
sockaddr_in6.

In pf_calc_mss(), factor common code out of PF_INET and PF_INET6
switch cases.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base netbsd-4-base
# 1.28 16-Nov-2006 christos

branches: 1.28.2; 1.28.8;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.27 12-Oct-2006 peter

Merge the peter-altq branch.

(sync with KAME & add support for using ALTQ with pf(4)).


# 1.26 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.25 07-Oct-2006 peter

PR/34746: Nino Dehne: pf(4)'s synproxy state breaks when used with tags

Apply OpenBSD src/sys/net/pf.c rev 1.486 and 1.487:

1.486:
When synproxy sends packets to the destination host, make sure to copy
the 'tag' from the original state entry into the outgoing mbuf.

1.487:
When synproxy completes the replayed handshake and modifies the state
into a normal one, it sets both peers' sequence windows. Fix a bug where
the previously advertised windows are applied to the wrong side (i.e.
peer A's seqhi is peer A's seqlo plus peer B's, not A's, window). This
went undetected because mostly the windows are similar and/or re-
advertised soon. But there are (rare) cases where a synproxy'd connection
would stall right after handshake. Found by Gleb Smirnoff.


# 1.24 01-Oct-2006 pavel

In pf, there are lots of #ifdef ALTQ, but our ALTQ is not what pf expects,
and if ALTQ and pf are both enabled, it leads to compile errors. So,
change all tests for ALTQ to ALTQ_NEW, which won't be defined.

This allows simultaneous compilation of pf and ALTQ and is a temporary
measure before the peter-altq brach is merged.

Tested and approved by Peter Postma.


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base rpaulo-netinet-merge-pcb-base
# 1.23 14-May-2006 christos

branches: 1.23.8; 1.23.10;
XXX: GCC uninitialized


Revision tags: elad-kernelauth-base
# 1.22 11-May-2006 mrg

quell GCC 4.1 uninitialised variable warnings.

XXX: we should audit the tree for which old ones are no longer needed
after getting the older compilers out of the tree..


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.21 19-Feb-2006 peter

branches: 1.21.2; 1.21.4; 1.21.6;
Fix TCP/UDP checksum handling as pointed out by Daniel Hartmeier in:
http://mail-index.netbsd.org/tech-net/2006/01/21/0000.html.

Problem reported and patch tested by der Mouse & Nino Dehne (PR/32874).


# 1.20 07-Feb-2006 rpaulo

In pf_socket_lookup() fix copy & paste problem when in6_pcblookup_bind()
returns NULL.


# 1.19 11-Dec-2005 christos

branches: 1.19.2; 1.19.4; 1.19.6;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base ktrace-lwp-base
# 1.18 23-Oct-2005 christos

Adjust for icmp_error signature.


Revision tags: yamt-vop-base
# 1.17 01-Jul-2005 peter

branches: 1.17.2; 1.17.4;
Resolve conflicts (pf from OpenBSD 3.7, kernel part).


# 1.16 15-Jun-2005 lukem

Use an "XXXGCC -Wuninitalized" style that is consistent with that used
elsewhere in the tree.


# 1.15 14-Jun-2005 jmc

Cleanup XXGCC in a few places to make it easier to see.


# 1.14 13-Jun-2005 jmc

Fix unitialized warnings that only crop up on m68k. XXGCC taggedd


# 1.13 07-May-2005 christos

more fallout from so_uid -> so_uidinfo.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.12 14-Feb-2005 peter

branches: 1.12.4;
Merge in a fix from OPENBSD_3_6.
ok yamt@

> MFC:
> Fix by dhartmei@
>
> ICMP state entries use the ICMP ID as port for the unique state key. When
> checking for a usable key, construct the key in the same way. Otherwise,
> a colliding key might be missed or a state insertion might be refused even
> though it could be inserted. The second case triggers the endless loop
> fixed by 1.474, possibly allowing a NATed LAN client to lock up the kernel.
> Report and test data by Srebrenko Sehic.


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.11 21-Dec-2004 peter

branches: 1.11.2; 1.11.4;
Apply a patch from OPENBSD_3_6 branch (ok yamt).

MFC:
Fix by dhartmei@

IPv6 packets can contain headers (like options) before the TCP/UDP/ICMP6
header. pf finds the first TCP/UDP/ICMP6 header to filter by traversing
the header chain. In the case where headers are skipped, the protocol
checksum verification used the wrong length (included the skipped headers),
leading to incorrectly mismatching checksums. Such IPv6 packets with
headers were silently dropped. Reported by Bernhard Schmidt.

ok deraadt@ dhartmei@ mcbride@


# 1.10 21-Dec-2004 peter

Apply a patch from OPENBSD_3_6 branch (ok yamt).

MFC:
Fix by mcbride@

Initialise init_addr in pf_map_addr() in the PF_POOL_ROUNDROBIN,
prevents a possible endless loop in pf_get_sport() with 'static-port'

Reported by adm at celeritystorm dot com in FreeBSD PR74930, debugging
by dhartmei@

ok mcbride@ dhartmei@ deraadt@ henning@


# 1.9 21-Dec-2004 yamt

pf_check_proto_cksum: use {tcp,udp}_input_checksum so that we can:
- handle loopback checksum omission properly.
- profit from h/w checksum offloading.


Revision tags: kent-audio1-base
# 1.8 05-Dec-2004 peter

Apply a patch from OpenBSD 3.6 branch (ok yamt@).

MFC:
Fix by dhartmei@

fix a bug that leads to a crash when binat rules of the form
'binat from ... to ... -> (if)' are used, where the interface
is dynamic. reported by kos(at)bastard(dot)net, analyzed by
Pyun YongHyeon.


# 1.7 21-Nov-2004 peter

Apply a patch from the OPENBSD_3_6 branch, ok itojun.

MFC:
Fix by dhartmei@

The flag to re-filter pf-generated packets was set wrong by synproxy
for ACKs. It should filter the ACK replayed to the server, instead of
of the one to the client.


# 1.6 21-Nov-2004 peter

Apply a patch from the OPENBSD_3_6 branch, ok itojun.

MFC:
Fix by dhartmei@

For RST generated due to state mismatch during handshake, don't set
th_flags TH_ACK and leave th_ack 0, just like the RST generated by
the stack in this case. Fixes the Raptor workaround.


# 1.5 14-Nov-2004 yamt

resolve conflicts. (pf from OpenBSD 3.6, kernel part)


# 1.4 08-Sep-2004 yamt

remove no longer needed caddr_t casts to reduce diffs from openbsd.


# 1.3 22-Jun-2004 martin

branches: 1.3.2;
Fix formatting for 64 bit archs. This fixes PR port-sparc64/26010.
While there, make it compile for non-INET6 aware kernels.


# 1.2 22-Jun-2004 itojun

PF from openbsd 3.5. missing features:
- pfsync (due to protocol # assignment issues)
- carp (not really a PF portion, but thought important to mention)
- PF and ALTQ are mutually-exclusive. this will be sorted out when
kjc@csl.sony.co.jp updates ALTQ and PF (and API inbetween)

reviewed by matt, christos, perry

torture-test is very welcomed.


# 1.1 22-Jun-2004 itojun

branches: 1.1.1;
Initial revision


# 1.78 09-Feb-2018 maxv

Oh, what is this. Fix a remotely-triggerable integer overflow: the way we
define TCPOLEN_SACK makes it unsigned, and the comparison in the while()
is unsigned too. That's not the expected behavior, the original code
wanted a signed comparison.

It's pretty easy to make 'hlen' go negative and trigger a buffer overflow.

This bug was reported 8 years ago by Lucio Albornoz in PR/44059.


Revision tags: tls-maxphys-base-20171202
# 1.77 31-Oct-2017 christos

PR/52682: David Binderman: Fix wrong assignment (in the !__NetBSD__ code)


Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.76 14-Feb-2017 ozaki-r

branches: 1.76.6;
Do ND in L2_output in the same manner as arpresolve

The benefits of this change are:
- The flow is consistent with IPv4 (and FreeBSD and OpenBSD)
- old: ip6_output => nd6_output (do ND if needed) => L2_output (lookup a stored cache)
- new: ip6_output => L2_output (lookup a cache. Do ND if cache not found)
- We can remove some workarounds in nd6_output
- We can move L2 specific operations to their own place
- The performance slightly improves because one cache lookup is reduced


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.75 08-Dec-2016 ozaki-r

branches: 1.75.2;
Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.74 20-Jun-2016 knakahara

branches: 1.74.2;
apply if_output_lock() to L3 callers which call ifp->if_output() of L2(or L3 tunneling).


# 1.73 10-Jun-2016 ozaki-r

Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.


Revision tags: netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base nick-nhusb-base-20160529 netbsd-7-0-1-RELEASE nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.72 25-Jul-2014 ozaki-r

branches: 1.72.2; 1.72.4; 1.72.6; 1.72.10;
Unbreak the build of pf


# 1.71 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base
# 1.70 20-Oct-2013 christos

branches: 1.70.2;
fix compiler warnings


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6 jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.69 22-Mar-2012 drochner

branches: 1.69.2; 1.69.4;
remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.68 19-Dec-2011 drochner

branches: 1.68.2; 1.68.6; 1.68.8;
do missing ipsec->kame_ipsec renames


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base
# 1.67 19-Nov-2011 tls

branches: 1.67.2;
First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.


Revision tags: jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.66 29-Aug-2011 jmcneill

branches: 1.66.2;
build pf module with WARNS=3, and remove the need for -Wno-shadow


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.65 03-May-2011 dyoung

Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.64 07-May-2010 degroote

branches: 1.64.2;
Add support for pfs(8)

pfs(8) is a tool similar to ipfs(8) but for pf(4). It allows the admin to
dump internal configuration of pf, and restore at a latter point, after a
maintenance reboot for example, in a transparent way for user.

This work has been done mostly during my GSoC 2009

No objections on tech-net@


Revision tags: uebayasi-xip-base1
# 1.63 12-Apr-2010 ahoka

- Make the pf and pflog driver able to detach.
- Add code for module support.

Original patch from Jared McNeill


# 1.62 12-Apr-2010 skrll

Spello in comment.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base
# 1.61 19-Jan-2010 pooka

branches: 1.61.2; 1.61.4;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.


# 1.60 30-Dec-2009 elad

Replace uidinfo.h with kauth.h, should fix problems observed by tron@.


# 1.59 30-Dec-2009 elad

Use the right member to store gid in the non-NetBSD case.

Pointed out by uebayasi@ and cegger@, thanks!


# 1.58 30-Dec-2009 elad

Get uid/gid from the socket's credentials.


Revision tags: matt-premerge-20091211 yamt-nfs-mp-base8 jym-xensuspend-nbase
# 1.57 14-Sep-2009 degroote

Import pfsync support from OpenBSD 4.2

Pfsync interface exposes change in the pf(4) over a pseudo-interface, and can
be used to synchronise different pf.

This work was part of my 2009 GSoC

No objection on tech-net@


Revision tags: yamt-nfs-mp-base7
# 1.56 28-Jul-2009 minskim

Remove LKM code from pf.


Revision tags: jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.55 16-Jun-2009 minskim

Reduce diff with OpenBSD. No functional change.


Revision tags: yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.54 13-Apr-2009 christos

Fix http://www.securityfocus.com/archive/1/502634, from OpenBSD.
XXX: should be pulled up to 5.x


Revision tags: netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.53 11-Oct-2008 pooka

branches: 1.53.2; 1.53.4; 1.53.8;
Move uidinfo to its own module in kern_uidinfo.c and include in rump.
No functional change to uidinfo.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.52 18-Jun-2008 yamt

branches: 1.52.2;
merge yamt-pf42 branch.
(import newer pf from OpenBSD 4.2)

ok'ed by peter@. requested by core@


Revision tags: yamt-pf42-base4 yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 yamt-nfs-mp-base yamt-pf42-base
# 1.51 15-Apr-2008 thorpej

branches: 1.51.2; 1.51.4; 1.51.6; 1.51.8;
Make ip6 and icmp6 stats per-cpu.


# 1.50 12-Apr-2008 thorpej

Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.49 08-Apr-2008 thorpej

Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.


# 1.48 08-Apr-2008 thorpej

Change TCP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old tcpstat structure; old netstat
binaries will continue to work properly.


# 1.47 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.46 06-Apr-2008 thorpej

Change UDP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmpstat structure; old netstat
binaries will continue to work properly.


# 1.45 06-Apr-2008 thorpej

Change ICMP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmpstat structure; old netstat
binaries will continue to work properly.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.44 14-Jan-2008 dyoung

branches: 1.44.6;
Change rtcache_init()+rtcache_getrt() and
rtcache_init_noclone()+rtcache_getrt() to single rtcache_init()
and rtcache_init_clone() calls.


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.43 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.42 11-Dec-2007 lukem

use __KERNEL_RCSID()


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.41 28-Nov-2007 dyoung

branches: 1.41.2; 1.41.4; 1.41.6;
Bug fix: make pf_route() set M_CSUM_IPV4 before calling ip_fragment().

If you use a route-to rule such as 'pass out quick on ath0 route-to
gre2 all', and the MTU on gre2 is smaller than the MTU on ath0,
then pf_route() will fragment your packet by calling ip_fragment().
Because pf_route() did not set M_CSUM_IPv4, ip_fragment() would
not compute the checksum on the fragments, and PF would send IP
fragments with bad checksums out of gre2.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base vmlocking-base
# 1.40 07-Aug-2007 yamt

branches: 1.40.2; 1.40.8;
reduce diff.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.39 17-May-2007 christos

branches: 1.39.2; 1.39.6;
Coverity CID 3157: remove bogus break.


Revision tags: yamt-idlelwp-base8
# 1.38 10-May-2007 dyoung

pfctl: extend pf.conf(5) syntax. Let the operator supply an optional
"state lock" flag (if-bound, gr-bound, floating) at the end of a
NAT rule. The new syntax is backwards-compatbile with the old
syntax.

PF (kernel): change the macro BOUND_IFACE() to the inline function
bound_iface(), and add a new argument, the applicable NAT rule.
Use both the flags on the applicable filter rule and on the applicable
NAT rule to decide whether or not to bind a state to the interface
or the group where it is created.


# 1.37 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.36 04-Mar-2007 christos

branches: 1.36.2; 1.36.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.35 17-Feb-2007 dyoung

In pf_rtlabel_match, use rtcache_free()/rtcache_init(). This is
just cosmetic, since the whole routine is presently #if 0'd.


Revision tags: post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 newlock2-base
# 1.34 15-Dec-2006 joerg

branches: 1.34.2;
Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


# 1.33 13-Dec-2006 matt

Don't apply a window scale to the window size in a SYN packet.


Revision tags: yamt-splraiseipl-base3
# 1.32 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.31 04-Dec-2006 dyoung

Indent these macros for readability. People have to read this
code, too.


# 1.30 04-Dec-2006 dyoung

Lightly constify. Helps compile-time checking that we are not
scribbling over shared or read-only memory---e.g., in mbufs.


# 1.29 04-Dec-2006 dyoung

No need for a struct route_in6 in pf_route6(). Replace it with a
sockaddr_in6.

In pf_calc_mss(), factor common code out of PF_INET and PF_INET6
switch cases.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base netbsd-4-base
# 1.28 16-Nov-2006 christos

branches: 1.28.2; 1.28.8;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.27 12-Oct-2006 peter

Merge the peter-altq branch.

(sync with KAME & add support for using ALTQ with pf(4)).


# 1.26 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.25 07-Oct-2006 peter

PR/34746: Nino Dehne: pf(4)'s synproxy state breaks when used with tags

Apply OpenBSD src/sys/net/pf.c rev 1.486 and 1.487:

1.486:
When synproxy sends packets to the destination host, make sure to copy
the 'tag' from the original state entry into the outgoing mbuf.

1.487:
When synproxy completes the replayed handshake and modifies the state
into a normal one, it sets both peers' sequence windows. Fix a bug where
the previously advertised windows are applied to the wrong side (i.e.
peer A's seqhi is peer A's seqlo plus peer B's, not A's, window). This
went undetected because mostly the windows are similar and/or re-
advertised soon. But there are (rare) cases where a synproxy'd connection
would stall right after handshake. Found by Gleb Smirnoff.


# 1.24 01-Oct-2006 pavel

In pf, there are lots of #ifdef ALTQ, but our ALTQ is not what pf expects,
and if ALTQ and pf are both enabled, it leads to compile errors. So,
change all tests for ALTQ to ALTQ_NEW, which won't be defined.

This allows simultaneous compilation of pf and ALTQ and is a temporary
measure before the peter-altq brach is merged.

Tested and approved by Peter Postma.


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base rpaulo-netinet-merge-pcb-base
# 1.23 14-May-2006 christos

branches: 1.23.8; 1.23.10;
XXX: GCC uninitialized


Revision tags: elad-kernelauth-base
# 1.22 11-May-2006 mrg

quell GCC 4.1 uninitialised variable warnings.

XXX: we should audit the tree for which old ones are no longer needed
after getting the older compilers out of the tree..


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.21 19-Feb-2006 peter

branches: 1.21.2; 1.21.4; 1.21.6;
Fix TCP/UDP checksum handling as pointed out by Daniel Hartmeier in:
http://mail-index.netbsd.org/tech-net/2006/01/21/0000.html.

Problem reported and patch tested by der Mouse & Nino Dehne (PR/32874).


# 1.20 07-Feb-2006 rpaulo

In pf_socket_lookup() fix copy & paste problem when in6_pcblookup_bind()
returns NULL.


# 1.19 11-Dec-2005 christos

branches: 1.19.2; 1.19.4; 1.19.6;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base ktrace-lwp-base
# 1.18 23-Oct-2005 christos

Adjust for icmp_error signature.


Revision tags: yamt-vop-base
# 1.17 01-Jul-2005 peter

branches: 1.17.2; 1.17.4;
Resolve conflicts (pf from OpenBSD 3.7, kernel part).


# 1.16 15-Jun-2005 lukem

Use an "XXXGCC -Wuninitalized" style that is consistent with that used
elsewhere in the tree.


# 1.15 14-Jun-2005 jmc

Cleanup XXGCC in a few places to make it easier to see.


# 1.14 13-Jun-2005 jmc

Fix unitialized warnings that only crop up on m68k. XXGCC taggedd


# 1.13 07-May-2005 christos

more fallout from so_uid -> so_uidinfo.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.12 14-Feb-2005 peter

branches: 1.12.4;
Merge in a fix from OPENBSD_3_6.
ok yamt@

> MFC:
> Fix by dhartmei@
>
> ICMP state entries use the ICMP ID as port for the unique state key. When
> checking for a usable key, construct the key in the same way. Otherwise,
> a colliding key might be missed or a state insertion might be refused even
> though it could be inserted. The second case triggers the endless loop
> fixed by 1.474, possibly allowing a NATed LAN client to lock up the kernel.
> Report and test data by Srebrenko Sehic.


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.11 21-Dec-2004 peter

branches: 1.11.2; 1.11.4;
Apply a patch from OPENBSD_3_6 branch (ok yamt).

MFC:
Fix by dhartmei@

IPv6 packets can contain headers (like options) before the TCP/UDP/ICMP6
header. pf finds the first TCP/UDP/ICMP6 header to filter by traversing
the header chain. In the case where headers are skipped, the protocol
checksum verification used the wrong length (included the skipped headers),
leading to incorrectly mismatching checksums. Such IPv6 packets with
headers were silently dropped. Reported by Bernhard Schmidt.

ok deraadt@ dhartmei@ mcbride@


# 1.10 21-Dec-2004 peter

Apply a patch from OPENBSD_3_6 branch (ok yamt).

MFC:
Fix by mcbride@

Initialise init_addr in pf_map_addr() in the PF_POOL_ROUNDROBIN,
prevents a possible endless loop in pf_get_sport() with 'static-port'

Reported by adm at celeritystorm dot com in FreeBSD PR74930, debugging
by dhartmei@

ok mcbride@ dhartmei@ deraadt@ henning@


# 1.9 21-Dec-2004 yamt

pf_check_proto_cksum: use {tcp,udp}_input_checksum so that we can:
- handle loopback checksum omission properly.
- profit from h/w checksum offloading.


Revision tags: kent-audio1-base
# 1.8 05-Dec-2004 peter

Apply a patch from OpenBSD 3.6 branch (ok yamt@).

MFC:
Fix by dhartmei@

fix a bug that leads to a crash when binat rules of the form
'binat from ... to ... -> (if)' are used, where the interface
is dynamic. reported by kos(at)bastard(dot)net, analyzed by
Pyun YongHyeon.


# 1.7 21-Nov-2004 peter

Apply a patch from the OPENBSD_3_6 branch, ok itojun.

MFC:
Fix by dhartmei@

The flag to re-filter pf-generated packets was set wrong by synproxy
for ACKs. It should filter the ACK replayed to the server, instead of
of the one to the client.


# 1.6 21-Nov-2004 peter

Apply a patch from the OPENBSD_3_6 branch, ok itojun.

MFC:
Fix by dhartmei@

For RST generated due to state mismatch during handshake, don't set
th_flags TH_ACK and leave th_ack 0, just like the RST generated by
the stack in this case. Fixes the Raptor workaround.


# 1.5 14-Nov-2004 yamt

resolve conflicts. (pf from OpenBSD 3.6, kernel part)


# 1.4 08-Sep-2004 yamt

remove no longer needed caddr_t casts to reduce diffs from openbsd.


# 1.3 22-Jun-2004 martin

branches: 1.3.2;
Fix formatting for 64 bit archs. This fixes PR port-sparc64/26010.
While there, make it compile for non-INET6 aware kernels.


# 1.2 22-Jun-2004 itojun

PF from openbsd 3.5. missing features:
- pfsync (due to protocol # assignment issues)
- carp (not really a PF portion, but thought important to mention)
- PF and ALTQ are mutually-exclusive. this will be sorted out when
kjc@csl.sony.co.jp updates ALTQ and PF (and API inbetween)

reviewed by matt, christos, perry

torture-test is very welcomed.


# 1.1 22-Jun-2004 itojun

branches: 1.1.1;
Initial revision


# 1.77 31-Oct-2017 christos

PR/52682: David Binderman: Fix wrong assignment (in the !__NetBSD__ code)


Revision tags: matt-nb8-mediatek-base nick-nhusb-base-20170825 perseant-stdc-iso10646-base netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.76 14-Feb-2017 ozaki-r

Do ND in L2_output in the same manner as arpresolve

The benefits of this change are:
- The flow is consistent with IPv4 (and FreeBSD and OpenBSD)
- old: ip6_output => nd6_output (do ND if needed) => L2_output (lookup a stored cache)
- new: ip6_output => L2_output (lookup a cache. Do ND if cache not found)
- We can remove some workarounds in nd6_output
- We can move L2 specific operations to their own place
- The performance slightly improves because one cache lookup is reduced


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.75 08-Dec-2016 ozaki-r

branches: 1.75.2;
Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.74 20-Jun-2016 knakahara

branches: 1.74.2;
apply if_output_lock() to L3 callers which call ifp->if_output() of L2(or L3 tunneling).


# 1.73 10-Jun-2016 ozaki-r

Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.


Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base nick-nhusb-base-20160529 netbsd-7-0-1-RELEASE nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.72 25-Jul-2014 ozaki-r

branches: 1.72.4;
Unbreak the build of pf


# 1.71 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base
# 1.70 20-Oct-2013 christos

branches: 1.70.2;
fix compiler warnings


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6 jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.69 22-Mar-2012 drochner

branches: 1.69.2; 1.69.4;
remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.68 19-Dec-2011 drochner

do missing ipsec->kame_ipsec renames


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base
# 1.67 19-Nov-2011 tls

branches: 1.67.2;
First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.


Revision tags: jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.66 29-Aug-2011 jmcneill

branches: 1.66.2;
build pf module with WARNS=3, and remove the need for -Wno-shadow


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.65 03-May-2011 dyoung

Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.64 07-May-2010 degroote

branches: 1.64.2;
Add support for pfs(8)

pfs(8) is a tool similar to ipfs(8) but for pf(4). It allows the admin to
dump internal configuration of pf, and restore at a latter point, after a
maintenance reboot for example, in a transparent way for user.

This work has been done mostly during my GSoC 2009

No objections on tech-net@


Revision tags: uebayasi-xip-base1
# 1.63 12-Apr-2010 ahoka

- Make the pf and pflog driver able to detach.
- Add code for module support.

Original patch from Jared McNeill


# 1.62 12-Apr-2010 skrll

Spello in comment.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base
# 1.61 19-Jan-2010 pooka

branches: 1.61.2; 1.61.4;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.


# 1.60 30-Dec-2009 elad

Replace uidinfo.h with kauth.h, should fix problems observed by tron@.


# 1.59 30-Dec-2009 elad

Use the right member to store gid in the non-NetBSD case.

Pointed out by uebayasi@ and cegger@, thanks!


# 1.58 30-Dec-2009 elad

Get uid/gid from the socket's credentials.


Revision tags: matt-premerge-20091211 yamt-nfs-mp-base8 jym-xensuspend-nbase
# 1.57 14-Sep-2009 degroote

Import pfsync support from OpenBSD 4.2

Pfsync interface exposes change in the pf(4) over a pseudo-interface, and can
be used to synchronise different pf.

This work was part of my 2009 GSoC

No objection on tech-net@


Revision tags: yamt-nfs-mp-base7
# 1.56 28-Jul-2009 minskim

Remove LKM code from pf.


Revision tags: jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.55 16-Jun-2009 minskim

Reduce diff with OpenBSD. No functional change.


Revision tags: yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.54 13-Apr-2009 christos

Fix http://www.securityfocus.com/archive/1/502634, from OpenBSD.
XXX: should be pulled up to 5.x


Revision tags: netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.53 11-Oct-2008 pooka

branches: 1.53.2; 1.53.4; 1.53.8;
Move uidinfo to its own module in kern_uidinfo.c and include in rump.
No functional change to uidinfo.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.52 18-Jun-2008 yamt

branches: 1.52.2;
merge yamt-pf42 branch.
(import newer pf from OpenBSD 4.2)

ok'ed by peter@. requested by core@


Revision tags: yamt-pf42-base4 yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 yamt-nfs-mp-base yamt-pf42-base
# 1.51 15-Apr-2008 thorpej

branches: 1.51.2; 1.51.4; 1.51.6; 1.51.8;
Make ip6 and icmp6 stats per-cpu.


# 1.50 12-Apr-2008 thorpej

Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.49 08-Apr-2008 thorpej

Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.


# 1.48 08-Apr-2008 thorpej

Change TCP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old tcpstat structure; old netstat
binaries will continue to work properly.


# 1.47 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.46 06-Apr-2008 thorpej

Change UDP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmpstat structure; old netstat
binaries will continue to work properly.


# 1.45 06-Apr-2008 thorpej

Change ICMP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmpstat structure; old netstat
binaries will continue to work properly.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.44 14-Jan-2008 dyoung

branches: 1.44.6;
Change rtcache_init()+rtcache_getrt() and
rtcache_init_noclone()+rtcache_getrt() to single rtcache_init()
and rtcache_init_clone() calls.


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.43 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.42 11-Dec-2007 lukem

use __KERNEL_RCSID()


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.41 28-Nov-2007 dyoung

branches: 1.41.2; 1.41.4; 1.41.6;
Bug fix: make pf_route() set M_CSUM_IPV4 before calling ip_fragment().

If you use a route-to rule such as 'pass out quick on ath0 route-to
gre2 all', and the MTU on gre2 is smaller than the MTU on ath0,
then pf_route() will fragment your packet by calling ip_fragment().
Because pf_route() did not set M_CSUM_IPv4, ip_fragment() would
not compute the checksum on the fragments, and PF would send IP
fragments with bad checksums out of gre2.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base vmlocking-base
# 1.40 07-Aug-2007 yamt

branches: 1.40.2; 1.40.8;
reduce diff.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.39 17-May-2007 christos

branches: 1.39.2; 1.39.6;
Coverity CID 3157: remove bogus break.


Revision tags: yamt-idlelwp-base8
# 1.38 10-May-2007 dyoung

pfctl: extend pf.conf(5) syntax. Let the operator supply an optional
"state lock" flag (if-bound, gr-bound, floating) at the end of a
NAT rule. The new syntax is backwards-compatbile with the old
syntax.

PF (kernel): change the macro BOUND_IFACE() to the inline function
bound_iface(), and add a new argument, the applicable NAT rule.
Use both the flags on the applicable filter rule and on the applicable
NAT rule to decide whether or not to bind a state to the interface
or the group where it is created.


# 1.37 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.36 04-Mar-2007 christos

branches: 1.36.2; 1.36.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.35 17-Feb-2007 dyoung

In pf_rtlabel_match, use rtcache_free()/rtcache_init(). This is
just cosmetic, since the whole routine is presently #if 0'd.


Revision tags: post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 newlock2-base
# 1.34 15-Dec-2006 joerg

branches: 1.34.2;
Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


# 1.33 13-Dec-2006 matt

Don't apply a window scale to the window size in a SYN packet.


Revision tags: yamt-splraiseipl-base3
# 1.32 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.31 04-Dec-2006 dyoung

Indent these macros for readability. People have to read this
code, too.


# 1.30 04-Dec-2006 dyoung

Lightly constify. Helps compile-time checking that we are not
scribbling over shared or read-only memory---e.g., in mbufs.


# 1.29 04-Dec-2006 dyoung

No need for a struct route_in6 in pf_route6(). Replace it with a
sockaddr_in6.

In pf_calc_mss(), factor common code out of PF_INET and PF_INET6
switch cases.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base netbsd-4-base
# 1.28 16-Nov-2006 christos

branches: 1.28.2; 1.28.8;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.27 12-Oct-2006 peter

Merge the peter-altq branch.

(sync with KAME & add support for using ALTQ with pf(4)).


# 1.26 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.25 07-Oct-2006 peter

PR/34746: Nino Dehne: pf(4)'s synproxy state breaks when used with tags

Apply OpenBSD src/sys/net/pf.c rev 1.486 and 1.487:

1.486:
When synproxy sends packets to the destination host, make sure to copy
the 'tag' from the original state entry into the outgoing mbuf.

1.487:
When synproxy completes the replayed handshake and modifies the state
into a normal one, it sets both peers' sequence windows. Fix a bug where
the previously advertised windows are applied to the wrong side (i.e.
peer A's seqhi is peer A's seqlo plus peer B's, not A's, window). This
went undetected because mostly the windows are similar and/or re-
advertised soon. But there are (rare) cases where a synproxy'd connection
would stall right after handshake. Found by Gleb Smirnoff.


# 1.24 01-Oct-2006 pavel

In pf, there are lots of #ifdef ALTQ, but our ALTQ is not what pf expects,
and if ALTQ and pf are both enabled, it leads to compile errors. So,
change all tests for ALTQ to ALTQ_NEW, which won't be defined.

This allows simultaneous compilation of pf and ALTQ and is a temporary
measure before the peter-altq brach is merged.

Tested and approved by Peter Postma.


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base rpaulo-netinet-merge-pcb-base
# 1.23 14-May-2006 christos

branches: 1.23.8; 1.23.10;
XXX: GCC uninitialized


Revision tags: elad-kernelauth-base
# 1.22 11-May-2006 mrg

quell GCC 4.1 uninitialised variable warnings.

XXX: we should audit the tree for which old ones are no longer needed
after getting the older compilers out of the tree..


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.21 19-Feb-2006 peter

branches: 1.21.2; 1.21.4; 1.21.6;
Fix TCP/UDP checksum handling as pointed out by Daniel Hartmeier in:
http://mail-index.netbsd.org/tech-net/2006/01/21/0000.html.

Problem reported and patch tested by der Mouse & Nino Dehne (PR/32874).


# 1.20 07-Feb-2006 rpaulo

In pf_socket_lookup() fix copy & paste problem when in6_pcblookup_bind()
returns NULL.


# 1.19 11-Dec-2005 christos

branches: 1.19.2; 1.19.4; 1.19.6;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base ktrace-lwp-base
# 1.18 23-Oct-2005 christos

Adjust for icmp_error signature.


Revision tags: yamt-vop-base
# 1.17 01-Jul-2005 peter

branches: 1.17.2; 1.17.4;
Resolve conflicts (pf from OpenBSD 3.7, kernel part).


# 1.16 15-Jun-2005 lukem

Use an "XXXGCC -Wuninitalized" style that is consistent with that used
elsewhere in the tree.


# 1.15 14-Jun-2005 jmc

Cleanup XXGCC in a few places to make it easier to see.


# 1.14 13-Jun-2005 jmc

Fix unitialized warnings that only crop up on m68k. XXGCC taggedd


# 1.13 07-May-2005 christos

more fallout from so_uid -> so_uidinfo.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.12 14-Feb-2005 peter

branches: 1.12.4;
Merge in a fix from OPENBSD_3_6.
ok yamt@

> MFC:
> Fix by dhartmei@
>
> ICMP state entries use the ICMP ID as port for the unique state key. When
> checking for a usable key, construct the key in the same way. Otherwise,
> a colliding key might be missed or a state insertion might be refused even
> though it could be inserted. The second case triggers the endless loop
> fixed by 1.474, possibly allowing a NATed LAN client to lock up the kernel.
> Report and test data by Srebrenko Sehic.


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.11 21-Dec-2004 peter

branches: 1.11.2; 1.11.4;
Apply a patch from OPENBSD_3_6 branch (ok yamt).

MFC:
Fix by dhartmei@

IPv6 packets can contain headers (like options) before the TCP/UDP/ICMP6
header. pf finds the first TCP/UDP/ICMP6 header to filter by traversing
the header chain. In the case where headers are skipped, the protocol
checksum verification used the wrong length (included the skipped headers),
leading to incorrectly mismatching checksums. Such IPv6 packets with
headers were silently dropped. Reported by Bernhard Schmidt.

ok deraadt@ dhartmei@ mcbride@


# 1.10 21-Dec-2004 peter

Apply a patch from OPENBSD_3_6 branch (ok yamt).

MFC:
Fix by mcbride@

Initialise init_addr in pf_map_addr() in the PF_POOL_ROUNDROBIN,
prevents a possible endless loop in pf_get_sport() with 'static-port'

Reported by adm at celeritystorm dot com in FreeBSD PR74930, debugging
by dhartmei@

ok mcbride@ dhartmei@ deraadt@ henning@


# 1.9 21-Dec-2004 yamt

pf_check_proto_cksum: use {tcp,udp}_input_checksum so that we can:
- handle loopback checksum omission properly.
- profit from h/w checksum offloading.


Revision tags: kent-audio1-base
# 1.8 05-Dec-2004 peter

Apply a patch from OpenBSD 3.6 branch (ok yamt@).

MFC:
Fix by dhartmei@

fix a bug that leads to a crash when binat rules of the form
'binat from ... to ... -> (if)' are used, where the interface
is dynamic. reported by kos(at)bastard(dot)net, analyzed by
Pyun YongHyeon.


# 1.7 21-Nov-2004 peter

Apply a patch from the OPENBSD_3_6 branch, ok itojun.

MFC:
Fix by dhartmei@

The flag to re-filter pf-generated packets was set wrong by synproxy
for ACKs. It should filter the ACK replayed to the server, instead of
of the one to the client.


# 1.6 21-Nov-2004 peter

Apply a patch from the OPENBSD_3_6 branch, ok itojun.

MFC:
Fix by dhartmei@

For RST generated due to state mismatch during handshake, don't set
th_flags TH_ACK and leave th_ack 0, just like the RST generated by
the stack in this case. Fixes the Raptor workaround.


# 1.5 14-Nov-2004 yamt

resolve conflicts. (pf from OpenBSD 3.6, kernel part)


# 1.4 08-Sep-2004 yamt

remove no longer needed caddr_t casts to reduce diffs from openbsd.


# 1.3 22-Jun-2004 martin

branches: 1.3.2;
Fix formatting for 64 bit archs. This fixes PR port-sparc64/26010.
While there, make it compile for non-INET6 aware kernels.


# 1.2 22-Jun-2004 itojun

PF from openbsd 3.5. missing features:
- pfsync (due to protocol # assignment issues)
- carp (not really a PF portion, but thought important to mention)
- PF and ALTQ are mutually-exclusive. this will be sorted out when
kjc@csl.sony.co.jp updates ALTQ and PF (and API inbetween)

reviewed by matt, christos, perry

torture-test is very welcomed.


# 1.1 22-Jun-2004 itojun

branches: 1.1.1;
Initial revision


# 1.76 14-Feb-2017 ozaki-r

Do ND in L2_output in the same manner as arpresolve

The benefits of this change are:
- The flow is consistent with IPv4 (and FreeBSD and OpenBSD)
- old: ip6_output => nd6_output (do ND if needed) => L2_output (lookup a stored cache)
- new: ip6_output => L2_output (lookup a cache. Do ND if cache not found)
- We can remove some workarounds in nd6_output
- We can move L2 specific operations to their own place
- The performance slightly improves because one cache lookup is reduced


Revision tags: nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.75 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.74 20-Jun-2016 knakahara

branches: 1.74.2;
apply if_output_lock() to L3 callers which call ifp->if_output() of L2(or L3 tunneling).


# 1.73 10-Jun-2016 ozaki-r

Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.


Revision tags: netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base nick-nhusb-base-20160529 netbsd-7-0-1-RELEASE nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.72 25-Jul-2014 ozaki-r

branches: 1.72.4;
Unbreak the build of pf


# 1.71 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base
# 1.70 20-Oct-2013 christos

branches: 1.70.2;
fix compiler warnings


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6 jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.69 22-Mar-2012 drochner

branches: 1.69.2; 1.69.4;
remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.68 19-Dec-2011 drochner

do missing ipsec->kame_ipsec renames


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base
# 1.67 19-Nov-2011 tls

branches: 1.67.2;
First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.


Revision tags: jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.66 29-Aug-2011 jmcneill

branches: 1.66.2;
build pf module with WARNS=3, and remove the need for -Wno-shadow


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.65 03-May-2011 dyoung

Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.64 07-May-2010 degroote

branches: 1.64.2;
Add support for pfs(8)

pfs(8) is a tool similar to ipfs(8) but for pf(4). It allows the admin to
dump internal configuration of pf, and restore at a latter point, after a
maintenance reboot for example, in a transparent way for user.

This work has been done mostly during my GSoC 2009

No objections on tech-net@


Revision tags: uebayasi-xip-base1
# 1.63 12-Apr-2010 ahoka

- Make the pf and pflog driver able to detach.
- Add code for module support.

Original patch from Jared McNeill


# 1.62 12-Apr-2010 skrll

Spello in comment.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base
# 1.61 19-Jan-2010 pooka

branches: 1.61.2; 1.61.4;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.


# 1.60 30-Dec-2009 elad

Replace uidinfo.h with kauth.h, should fix problems observed by tron@.


# 1.59 30-Dec-2009 elad

Use the right member to store gid in the non-NetBSD case.

Pointed out by uebayasi@ and cegger@, thanks!


# 1.58 30-Dec-2009 elad

Get uid/gid from the socket's credentials.


Revision tags: matt-premerge-20091211 yamt-nfs-mp-base8 jym-xensuspend-nbase
# 1.57 14-Sep-2009 degroote

Import pfsync support from OpenBSD 4.2

Pfsync interface exposes change in the pf(4) over a pseudo-interface, and can
be used to synchronise different pf.

This work was part of my 2009 GSoC

No objection on tech-net@


Revision tags: yamt-nfs-mp-base7
# 1.56 28-Jul-2009 minskim

Remove LKM code from pf.


Revision tags: jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.55 16-Jun-2009 minskim

Reduce diff with OpenBSD. No functional change.


Revision tags: yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.54 13-Apr-2009 christos

Fix http://www.securityfocus.com/archive/1/502634, from OpenBSD.
XXX: should be pulled up to 5.x


Revision tags: netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.53 11-Oct-2008 pooka

branches: 1.53.2; 1.53.4; 1.53.8;
Move uidinfo to its own module in kern_uidinfo.c and include in rump.
No functional change to uidinfo.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.52 18-Jun-2008 yamt

branches: 1.52.2;
merge yamt-pf42 branch.
(import newer pf from OpenBSD 4.2)

ok'ed by peter@. requested by core@


Revision tags: yamt-pf42-base4 yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 yamt-nfs-mp-base yamt-pf42-base
# 1.51 15-Apr-2008 thorpej

branches: 1.51.2; 1.51.4; 1.51.6; 1.51.8;
Make ip6 and icmp6 stats per-cpu.


# 1.50 12-Apr-2008 thorpej

Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.49 08-Apr-2008 thorpej

Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.


# 1.48 08-Apr-2008 thorpej

Change TCP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old tcpstat structure; old netstat
binaries will continue to work properly.


# 1.47 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.46 06-Apr-2008 thorpej

Change UDP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmpstat structure; old netstat
binaries will continue to work properly.


# 1.45 06-Apr-2008 thorpej

Change ICMP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmpstat structure; old netstat
binaries will continue to work properly.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.44 14-Jan-2008 dyoung

branches: 1.44.6;
Change rtcache_init()+rtcache_getrt() and
rtcache_init_noclone()+rtcache_getrt() to single rtcache_init()
and rtcache_init_clone() calls.


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.43 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.42 11-Dec-2007 lukem

use __KERNEL_RCSID()


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.41 28-Nov-2007 dyoung

branches: 1.41.2; 1.41.4; 1.41.6;
Bug fix: make pf_route() set M_CSUM_IPV4 before calling ip_fragment().

If you use a route-to rule such as 'pass out quick on ath0 route-to
gre2 all', and the MTU on gre2 is smaller than the MTU on ath0,
then pf_route() will fragment your packet by calling ip_fragment().
Because pf_route() did not set M_CSUM_IPv4, ip_fragment() would
not compute the checksum on the fragments, and PF would send IP
fragments with bad checksums out of gre2.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base vmlocking-base
# 1.40 07-Aug-2007 yamt

branches: 1.40.2; 1.40.8;
reduce diff.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.39 17-May-2007 christos

branches: 1.39.2; 1.39.6;
Coverity CID 3157: remove bogus break.


Revision tags: yamt-idlelwp-base8
# 1.38 10-May-2007 dyoung

pfctl: extend pf.conf(5) syntax. Let the operator supply an optional
"state lock" flag (if-bound, gr-bound, floating) at the end of a
NAT rule. The new syntax is backwards-compatbile with the old
syntax.

PF (kernel): change the macro BOUND_IFACE() to the inline function
bound_iface(), and add a new argument, the applicable NAT rule.
Use both the flags on the applicable filter rule and on the applicable
NAT rule to decide whether or not to bind a state to the interface
or the group where it is created.


# 1.37 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.36 04-Mar-2007 christos

branches: 1.36.2; 1.36.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.35 17-Feb-2007 dyoung

In pf_rtlabel_match, use rtcache_free()/rtcache_init(). This is
just cosmetic, since the whole routine is presently #if 0'd.


Revision tags: post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 newlock2-base
# 1.34 15-Dec-2006 joerg

branches: 1.34.2;
Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


# 1.33 13-Dec-2006 matt

Don't apply a window scale to the window size in a SYN packet.


Revision tags: yamt-splraiseipl-base3
# 1.32 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.31 04-Dec-2006 dyoung

Indent these macros for readability. People have to read this
code, too.


# 1.30 04-Dec-2006 dyoung

Lightly constify. Helps compile-time checking that we are not
scribbling over shared or read-only memory---e.g., in mbufs.


# 1.29 04-Dec-2006 dyoung

No need for a struct route_in6 in pf_route6(). Replace it with a
sockaddr_in6.

In pf_calc_mss(), factor common code out of PF_INET and PF_INET6
switch cases.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base netbsd-4-base
# 1.28 16-Nov-2006 christos

branches: 1.28.2; 1.28.8;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.27 12-Oct-2006 peter

Merge the peter-altq branch.

(sync with KAME & add support for using ALTQ with pf(4)).


# 1.26 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.25 07-Oct-2006 peter

PR/34746: Nino Dehne: pf(4)'s synproxy state breaks when used with tags

Apply OpenBSD src/sys/net/pf.c rev 1.486 and 1.487:

1.486:
When synproxy sends packets to the destination host, make sure to copy
the 'tag' from the original state entry into the outgoing mbuf.

1.487:
When synproxy completes the replayed handshake and modifies the state
into a normal one, it sets both peers' sequence windows. Fix a bug where
the previously advertised windows are applied to the wrong side (i.e.
peer A's seqhi is peer A's seqlo plus peer B's, not A's, window). This
went undetected because mostly the windows are similar and/or re-
advertised soon. But there are (rare) cases where a synproxy'd connection
would stall right after handshake. Found by Gleb Smirnoff.


# 1.24 01-Oct-2006 pavel

In pf, there are lots of #ifdef ALTQ, but our ALTQ is not what pf expects,
and if ALTQ and pf are both enabled, it leads to compile errors. So,
change all tests for ALTQ to ALTQ_NEW, which won't be defined.

This allows simultaneous compilation of pf and ALTQ and is a temporary
measure before the peter-altq brach is merged.

Tested and approved by Peter Postma.


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base rpaulo-netinet-merge-pcb-base
# 1.23 14-May-2006 christos

branches: 1.23.8; 1.23.10;
XXX: GCC uninitialized


Revision tags: elad-kernelauth-base
# 1.22 11-May-2006 mrg

quell GCC 4.1 uninitialised variable warnings.

XXX: we should audit the tree for which old ones are no longer needed
after getting the older compilers out of the tree..


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.21 19-Feb-2006 peter

branches: 1.21.2; 1.21.4; 1.21.6;
Fix TCP/UDP checksum handling as pointed out by Daniel Hartmeier in:
http://mail-index.netbsd.org/tech-net/2006/01/21/0000.html.

Problem reported and patch tested by der Mouse & Nino Dehne (PR/32874).


# 1.20 07-Feb-2006 rpaulo

In pf_socket_lookup() fix copy & paste problem when in6_pcblookup_bind()
returns NULL.


# 1.19 11-Dec-2005 christos

branches: 1.19.2; 1.19.4; 1.19.6;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base ktrace-lwp-base
# 1.18 23-Oct-2005 christos

Adjust for icmp_error signature.


Revision tags: yamt-vop-base
# 1.17 01-Jul-2005 peter

branches: 1.17.2; 1.17.4;
Resolve conflicts (pf from OpenBSD 3.7, kernel part).


# 1.16 15-Jun-2005 lukem

Use an "XXXGCC -Wuninitalized" style that is consistent with that used
elsewhere in the tree.


# 1.15 14-Jun-2005 jmc

Cleanup XXGCC in a few places to make it easier to see.


# 1.14 13-Jun-2005 jmc

Fix unitialized warnings that only crop up on m68k. XXGCC taggedd


# 1.13 07-May-2005 christos

more fallout from so_uid -> so_uidinfo.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.12 14-Feb-2005 peter

branches: 1.12.4;
Merge in a fix from OPENBSD_3_6.
ok yamt@

> MFC:
> Fix by dhartmei@
>
> ICMP state entries use the ICMP ID as port for the unique state key. When
> checking for a usable key, construct the key in the same way. Otherwise,
> a colliding key might be missed or a state insertion might be refused even
> though it could be inserted. The second case triggers the endless loop
> fixed by 1.474, possibly allowing a NATed LAN client to lock up the kernel.
> Report and test data by Srebrenko Sehic.


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.11 21-Dec-2004 peter

branches: 1.11.2; 1.11.4;
Apply a patch from OPENBSD_3_6 branch (ok yamt).

MFC:
Fix by dhartmei@

IPv6 packets can contain headers (like options) before the TCP/UDP/ICMP6
header. pf finds the first TCP/UDP/ICMP6 header to filter by traversing
the header chain. In the case where headers are skipped, the protocol
checksum verification used the wrong length (included the skipped headers),
leading to incorrectly mismatching checksums. Such IPv6 packets with
headers were silently dropped. Reported by Bernhard Schmidt.

ok deraadt@ dhartmei@ mcbride@


# 1.10 21-Dec-2004 peter

Apply a patch from OPENBSD_3_6 branch (ok yamt).

MFC:
Fix by mcbride@

Initialise init_addr in pf_map_addr() in the PF_POOL_ROUNDROBIN,
prevents a possible endless loop in pf_get_sport() with 'static-port'

Reported by adm at celeritystorm dot com in FreeBSD PR74930, debugging
by dhartmei@

ok mcbride@ dhartmei@ deraadt@ henning@


# 1.9 21-Dec-2004 yamt

pf_check_proto_cksum: use {tcp,udp}_input_checksum so that we can:
- handle loopback checksum omission properly.
- profit from h/w checksum offloading.


Revision tags: kent-audio1-base
# 1.8 05-Dec-2004 peter

Apply a patch from OpenBSD 3.6 branch (ok yamt@).

MFC:
Fix by dhartmei@

fix a bug that leads to a crash when binat rules of the form
'binat from ... to ... -> (if)' are used, where the interface
is dynamic. reported by kos(at)bastard(dot)net, analyzed by
Pyun YongHyeon.


# 1.7 21-Nov-2004 peter

Apply a patch from the OPENBSD_3_6 branch, ok itojun.

MFC:
Fix by dhartmei@

The flag to re-filter pf-generated packets was set wrong by synproxy
for ACKs. It should filter the ACK replayed to the server, instead of
of the one to the client.


# 1.6 21-Nov-2004 peter

Apply a patch from the OPENBSD_3_6 branch, ok itojun.

MFC:
Fix by dhartmei@

For RST generated due to state mismatch during handshake, don't set
th_flags TH_ACK and leave th_ack 0, just like the RST generated by
the stack in this case. Fixes the Raptor workaround.


# 1.5 14-Nov-2004 yamt

resolve conflicts. (pf from OpenBSD 3.6, kernel part)


# 1.4 08-Sep-2004 yamt

remove no longer needed caddr_t casts to reduce diffs from openbsd.


# 1.3 22-Jun-2004 martin

branches: 1.3.2;
Fix formatting for 64 bit archs. This fixes PR port-sparc64/26010.
While there, make it compile for non-INET6 aware kernels.


# 1.2 22-Jun-2004 itojun

PF from openbsd 3.5. missing features:
- pfsync (due to protocol # assignment issues)
- carp (not really a PF portion, but thought important to mention)
- PF and ALTQ are mutually-exclusive. this will be sorted out when
kjc@csl.sony.co.jp updates ALTQ and PF (and API inbetween)

reviewed by matt, christos, perry

torture-test is very welcomed.


# 1.1 22-Jun-2004 itojun

branches: 1.1.1;
Initial revision


# 1.75 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104 nick-nhusb-base-20161004 localcount-20160914 pgoyette-localcount-20160806 pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.74 20-Jun-2016 knakahara

apply if_output_lock() to L3 callers which call ifp->if_output() of L2(or L3 tunneling).


# 1.73 10-Jun-2016 ozaki-r

Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.


Revision tags: netbsd-7-0-2-RELEASE netbsd-7-nhusb-base nick-nhusb-base-20160529 netbsd-7-0-1-RELEASE nick-nhusb-base-20160422 nick-nhusb-base-20160319 nick-nhusb-base-20151226 netbsd-7-0-RELEASE nick-nhusb-base-20150921 netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.72 25-Jul-2014 ozaki-r

branches: 1.72.4;
Unbreak the build of pf


# 1.71 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 rmind-smpnet-nbase rmind-smpnet-base
# 1.70 20-Oct-2013 christos

branches: 1.70.2;
fix compiler warnings


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6 jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.69 22-Mar-2012 drochner

branches: 1.69.2; 1.69.4;
remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.68 19-Dec-2011 drochner

do missing ipsec->kame_ipsec renames


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base
# 1.67 19-Nov-2011 tls

branches: 1.67.2;
First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.


Revision tags: jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.66 29-Aug-2011 jmcneill

branches: 1.66.2;
build pf module with WARNS=3, and remove the need for -Wno-shadow


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.65 03-May-2011 dyoung

Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.64 07-May-2010 degroote

branches: 1.64.2;
Add support for pfs(8)

pfs(8) is a tool similar to ipfs(8) but for pf(4). It allows the admin to
dump internal configuration of pf, and restore at a latter point, after a
maintenance reboot for example, in a transparent way for user.

This work has been done mostly during my GSoC 2009

No objections on tech-net@


Revision tags: uebayasi-xip-base1
# 1.63 12-Apr-2010 ahoka

- Make the pf and pflog driver able to detach.
- Add code for module support.

Original patch from Jared McNeill


# 1.62 12-Apr-2010 skrll

Spello in comment.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base
# 1.61 19-Jan-2010 pooka

branches: 1.61.2; 1.61.4;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.


# 1.60 30-Dec-2009 elad

Replace uidinfo.h with kauth.h, should fix problems observed by tron@.


# 1.59 30-Dec-2009 elad

Use the right member to store gid in the non-NetBSD case.

Pointed out by uebayasi@ and cegger@, thanks!


# 1.58 30-Dec-2009 elad

Get uid/gid from the socket's credentials.


Revision tags: matt-premerge-20091211 yamt-nfs-mp-base8 jym-xensuspend-nbase
# 1.57 14-Sep-2009 degroote

Import pfsync support from OpenBSD 4.2

Pfsync interface exposes change in the pf(4) over a pseudo-interface, and can
be used to synchronise different pf.

This work was part of my 2009 GSoC

No objection on tech-net@


Revision tags: yamt-nfs-mp-base7
# 1.56 28-Jul-2009 minskim

Remove LKM code from pf.


Revision tags: jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5
# 1.55 16-Jun-2009 minskim

Reduce diff with OpenBSD. No functional change.


Revision tags: yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.54 13-Apr-2009 christos

Fix http://www.securityfocus.com/archive/1/502634, from OpenBSD.
XXX: should be pulled up to 5.x


Revision tags: netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 haad-dm-base mjf-devfs2-base
# 1.53 11-Oct-2008 pooka

branches: 1.53.2; 1.53.4; 1.53.8;
Move uidinfo to its own module in kern_uidinfo.c and include in rump.
No functional change to uidinfo.


Revision tags: wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase simonb-wapbl-base wrstuden-revivesa-base
# 1.52 18-Jun-2008 yamt

branches: 1.52.2;
merge yamt-pf42 branch.
(import newer pf from OpenBSD 4.2)

ok'ed by peter@. requested by core@


Revision tags: yamt-pf42-base4 yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 yamt-nfs-mp-base yamt-pf42-base
# 1.51 15-Apr-2008 thorpej

branches: 1.51.2; 1.51.4; 1.51.6; 1.51.8;
Make ip6 and icmp6 stats per-cpu.


# 1.50 12-Apr-2008 thorpej

Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.49 08-Apr-2008 thorpej

Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.


# 1.48 08-Apr-2008 thorpej

Change TCP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old tcpstat structure; old netstat
binaries will continue to work properly.


# 1.47 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.46 06-Apr-2008 thorpej

Change UDP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmpstat structure; old netstat
binaries will continue to work properly.


# 1.45 06-Apr-2008 thorpej

Change ICMP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmpstat structure; old netstat
binaries will continue to work properly.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base bouyer-xeni386-nbase bouyer-xeni386-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.44 14-Jan-2008 dyoung

branches: 1.44.6;
Change rtcache_init()+rtcache_getrt() and
rtcache_init_noclone()+rtcache_getrt() to single rtcache_init()
and rtcache_init_clone() calls.


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.43 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2
# 1.42 11-Dec-2007 lukem

use __KERNEL_RCSID()


Revision tags: yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.41 28-Nov-2007 dyoung

branches: 1.41.2; 1.41.4; 1.41.6;
Bug fix: make pf_route() set M_CSUM_IPV4 before calling ip_fragment().

If you use a route-to rule such as 'pass out quick on ath0 route-to
gre2 all', and the MTU on gre2 is smaller than the MTU on ath0,
then pf_route() will fragment your packet by calling ip_fragment().
Because pf_route() did not set M_CSUM_IPv4, ip_fragment() would
not compute the checksum on the fragments, and PF would send IP
fragments with bad checksums out of gre2.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking jmcneill-base bouyer-xenamd64-base2 yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base vmlocking-base
# 1.40 07-Aug-2007 yamt

branches: 1.40.2; 1.40.8;
reduce diff.


Revision tags: matt-mips64-base nick-csl-alignment-base mjf-ufs-trans-base
# 1.39 17-May-2007 christos

branches: 1.39.2; 1.39.6;
Coverity CID 3157: remove bogus break.


Revision tags: yamt-idlelwp-base8
# 1.38 10-May-2007 dyoung

pfctl: extend pf.conf(5) syntax. Let the operator supply an optional
"state lock" flag (if-bound, gr-bound, floating) at the end of a
NAT rule. The new syntax is backwards-compatbile with the old
syntax.

PF (kernel): change the macro BOUND_IFACE() to the inline function
bound_iface(), and add a new argument, the applicable NAT rule.
Use both the flags on the applicable filter rule and on the applicable
NAT rule to decide whether or not to bind a state to the interface
or the group where it is created.


# 1.37 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.36 04-Mar-2007 christos

branches: 1.36.2; 1.36.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.35 17-Feb-2007 dyoung

In pf_rtlabel_match, use rtcache_free()/rtcache_init(). This is
just cosmetic, since the whole routine is presently #if 0'd.


Revision tags: post-newlock2-merge newlock2-nbase yamt-splraiseipl-base5 yamt-splraiseipl-base4 newlock2-base
# 1.34 15-Dec-2006 joerg

branches: 1.34.2;
Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


# 1.33 13-Dec-2006 matt

Don't apply a window scale to the window size in a SYN packet.


Revision tags: yamt-splraiseipl-base3
# 1.32 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.31 04-Dec-2006 dyoung

Indent these macros for readability. People have to read this
code, too.


# 1.30 04-Dec-2006 dyoung

Lightly constify. Helps compile-time checking that we are not
scribbling over shared or read-only memory---e.g., in mbufs.


# 1.29 04-Dec-2006 dyoung

No need for a struct route_in6 in pf_route6(). Replace it with a
sockaddr_in6.

In pf_calc_mss(), factor common code out of PF_INET and PF_INET6
switch cases.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base netbsd-4-base
# 1.28 16-Nov-2006 christos

branches: 1.28.2; 1.28.8;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.27 12-Oct-2006 peter

Merge the peter-altq branch.

(sync with KAME & add support for using ALTQ with pf(4)).


# 1.26 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.25 07-Oct-2006 peter

PR/34746: Nino Dehne: pf(4)'s synproxy state breaks when used with tags

Apply OpenBSD src/sys/net/pf.c rev 1.486 and 1.487:

1.486:
When synproxy sends packets to the destination host, make sure to copy
the 'tag' from the original state entry into the outgoing mbuf.

1.487:
When synproxy completes the replayed handshake and modifies the state
into a normal one, it sets both peers' sequence windows. Fix a bug where
the previously advertised windows are applied to the wrong side (i.e.
peer A's seqhi is peer A's seqlo plus peer B's, not A's, window). This
went undetected because mostly the windows are similar and/or re-
advertised soon. But there are (rare) cases where a synproxy'd connection
would stall right after handshake. Found by Gleb Smirnoff.


# 1.24 01-Oct-2006 pavel

In pf, there are lots of #ifdef ALTQ, but our ALTQ is not what pf expects,
and if ALTQ and pf are both enabled, it leads to compile errors. So,
change all tests for ALTQ to ALTQ_NEW, which won't be defined.

This allows simultaneous compilation of pf and ALTQ and is a temporary
measure before the peter-altq brach is merged.

Tested and approved by Peter Postma.


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base yamt-pdpolicy-base5 chap-midi-base simonb-timecounters-base rpaulo-netinet-merge-pcb-base
# 1.23 14-May-2006 christos

branches: 1.23.8; 1.23.10;
XXX: GCC uninitialized


Revision tags: elad-kernelauth-base
# 1.22 11-May-2006 mrg

quell GCC 4.1 uninitialised variable warnings.

XXX: we should audit the tree for which old ones are no longer needed
after getting the older compilers out of the tree..


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.21 19-Feb-2006 peter

branches: 1.21.2; 1.21.4; 1.21.6;
Fix TCP/UDP checksum handling as pointed out by Daniel Hartmeier in:
http://mail-index.netbsd.org/tech-net/2006/01/21/0000.html.

Problem reported and patch tested by der Mouse & Nino Dehne (PR/32874).


# 1.20 07-Feb-2006 rpaulo

In pf_socket_lookup() fix copy & paste problem when in6_pcblookup_bind()
returns NULL.


# 1.19 11-Dec-2005 christos

branches: 1.19.2; 1.19.4; 1.19.6;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base ktrace-lwp-base
# 1.18 23-Oct-2005 christos

Adjust for icmp_error signature.


Revision tags: yamt-vop-base
# 1.17 01-Jul-2005 peter

branches: 1.17.2; 1.17.4;
Resolve conflicts (pf from OpenBSD 3.7, kernel part).


# 1.16 15-Jun-2005 lukem

Use an "XXXGCC -Wuninitalized" style that is consistent with that used
elsewhere in the tree.


# 1.15 14-Jun-2005 jmc

Cleanup XXGCC in a few places to make it easier to see.


# 1.14 13-Jun-2005 jmc

Fix unitialized warnings that only crop up on m68k. XXGCC taggedd


# 1.13 07-May-2005 christos

more fallout from so_uid -> so_uidinfo.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.12 14-Feb-2005 peter

branches: 1.12.4;
Merge in a fix from OPENBSD_3_6.
ok yamt@

> MFC:
> Fix by dhartmei@
>
> ICMP state entries use the ICMP ID as port for the unique state key. When
> checking for a usable key, construct the key in the same way. Otherwise,
> a colliding key might be missed or a state insertion might be refused even
> though it could be inserted. The second case triggers the endless loop
> fixed by 1.474, possibly allowing a NATed LAN client to lock up the kernel.
> Report and test data by Srebrenko Sehic.


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge
# 1.11 21-Dec-2004 peter

branches: 1.11.2; 1.11.4;
Apply a patch from OPENBSD_3_6 branch (ok yamt).

MFC:
Fix by dhartmei@

IPv6 packets can contain headers (like options) before the TCP/UDP/ICMP6
header. pf finds the first TCP/UDP/ICMP6 header to filter by traversing
the header chain. In the case where headers are skipped, the protocol
checksum verification used the wrong length (included the skipped headers),
leading to incorrectly mismatching checksums. Such IPv6 packets with
headers were silently dropped. Reported by Bernhard Schmidt.

ok deraadt@ dhartmei@ mcbride@


# 1.10 21-Dec-2004 peter

Apply a patch from OPENBSD_3_6 branch (ok yamt).

MFC:
Fix by mcbride@

Initialise init_addr in pf_map_addr() in the PF_POOL_ROUNDROBIN,
prevents a possible endless loop in pf_get_sport() with 'static-port'

Reported by adm at celeritystorm dot com in FreeBSD PR74930, debugging
by dhartmei@

ok mcbride@ dhartmei@ deraadt@ henning@


# 1.9 21-Dec-2004 yamt

pf_check_proto_cksum: use {tcp,udp}_input_checksum so that we can:
- handle loopback checksum omission properly.
- profit from h/w checksum offloading.


Revision tags: kent-audio1-base
# 1.8 05-Dec-2004 peter

Apply a patch from OpenBSD 3.6 branch (ok yamt@).

MFC:
Fix by dhartmei@

fix a bug that leads to a crash when binat rules of the form
'binat from ... to ... -> (if)' are used, where the interface
is dynamic. reported by kos(at)bastard(dot)net, analyzed by
Pyun YongHyeon.


# 1.7 21-Nov-2004 peter

Apply a patch from the OPENBSD_3_6 branch, ok itojun.

MFC:
Fix by dhartmei@

The flag to re-filter pf-generated packets was set wrong by synproxy
for ACKs. It should filter the ACK replayed to the server, instead of
of the one to the client.


# 1.6 21-Nov-2004 peter

Apply a patch from the OPENBSD_3_6 branch, ok itojun.

MFC:
Fix by dhartmei@

For RST generated due to state mismatch during handshake, don't set
th_flags TH_ACK and leave th_ack 0, just like the RST generated by
the stack in this case. Fixes the Raptor workaround.


# 1.5 14-Nov-2004 yamt

resolve conflicts. (pf from OpenBSD 3.6, kernel part)


# 1.4 08-Sep-2004 yamt

remove no longer needed caddr_t casts to reduce diffs from openbsd.


# 1.3 22-Jun-2004 martin

branches: 1.3.2;
Fix formatting for 64 bit archs. This fixes PR port-sparc64/26010.
While there, make it compile for non-INET6 aware kernels.


# 1.2 22-Jun-2004 itojun

PF from openbsd 3.5. missing features:
- pfsync (due to protocol # assignment issues)
- carp (not really a PF portion, but thought important to mention)
- PF and ALTQ are mutually-exclusive. this will be sorted out when
kjc@csl.sony.co.jp updates ALTQ and PF (and API inbetween)

reviewed by matt, christos, perry

torture-test is very welcomed.


# 1.1 22-Jun-2004 itojun

branches: 1.1.1;
Initial revision